All About Scripts
What is a Script?
Scripts are specific elements that are part of a LOST annotation
pipeline. A script element is implemented as a python3 module. The
listing below <aascripts-anno-all-imgs>
shows an example of such a script. This script will request image
annotations for all images of a dataset.
@TODO ::: #aascripts-anno-all-imgs .literalinclude caption="Listing 1: An example LOST script." ../../../backend/lost/pyapi/examples/pipes/mia/anno_all_imgs.py :::
In order to implement a script you need to create a python class that
inherits from lost.pyapi.script.Script. Your class needs to implement a main method needs
to be instantiated within your python script. The
listing below <aascripts-hello-world>
shows a minimum example for a script.
from lost.pyapi import script
class MyScript(script.Script):
def main(self):
self.logger.info('Hello World!')
if __name__ == "__main__":
MyScript()
Example Scripts
More script examples can be found here: lost/backend/lost/pyapi/examples/pipes
The LOST PyAPI Script Model
As all pipeline elements a script has an input and an output
object. Via these objects it is connected to other elements in a
pipeline (see also aapipelines-pipe-def-files).
Inside a script you can exchange information with the connected elements
by using the self.inp <lost.pyapi.inout.Input> object and the
self.outp <lost.pyapi.inout.ScriptOutput> object.
Reading Imagesets
It is a common pattern to read a path to an imageset from a
Datasource element in your annotation pipeline. See
Listing 3 <aascripts-reading-images> for
a code example. Since multiple Datasources could be connected to our
script, we iterate over all connected Datasources of the input with
self.inp.datasources <lost.pyapi.inout.Input.datasources>. For each Datasource element we can read the
path attribute to get the filesystem path to a folder with images.
from lost.pyapi import script
import os
class MyScript(script.Script):
def main(self):
for ds in self.inp.datasources:
for img_file in os.listdir(ds.path):
img_path = os.path.join(ds.path, img_file)
if __name__ == "__main__":
MyScript()
Requesting Annotations
The most important feature of the LOST PyAPI is the ability to request
annotations for a connected AnnotationTask element. Inside a
Script you can access the output element and call the
self.outp.request_annos <lost.pyapi.inout.ScriptOutput.request_annos> method (see
Listing 4 <aascripts-request-annos>).
self.outp.self.outp.request_annos(img_path)
Sometimes you also want to send annotation proposals to an
AnnotationTask in order to support your annotator. In most cases
these proposals will be generated by an AI, like an object detector. The
listing below <aascripts-request-anno-proposals> shows a simple example to send a dummy box and a dummy point
to an annotation tool.
self.outp.self.outp.request_annos(img_path,
annos = [[0.1, 0.1, 0.2, 0.2], [0.1, 0.2]],
anno_types = ['bbox', 'point'])
Annotation Broadcasting
If multiple AnnoTask elements are connected to your
ScriptOutput <lost.pyapi.inout.ScriptOutput> and you call
self.outp.request_annos <lost.pyapi.inout.ScriptOutput.request_annos>, the annotation request will be broadcasted to all
connected AnnoTasks. So each AnnoTask will get its own copy of
your annotation request. Technically, for each annotation request an
empty ImageAnno <lost.db.model.ImageAnno> will be created for each AnnoTask. During the
annotation process this
ImageAnno <lost.db.model.ImageAnno>
will be filled with information.
Reading Annotations
Another important task is to read annotations from previous pipeline
elements. In most cases this will be
AnnoTask <lost.pyapi.pipe_elements.AnnoTask> elements.
If you like to read all annotations at the
script input <lost.pyapi.inout.Input> in a vectorized way, you can use
self.inp.to_df() <lost.pyapi.inout.Input.to_df> to get a pandas
DataFrame
or self.inp.to_vec() <lost.pyapi.inout.Input.to_vec> to get a list of lists.
If you prefer to iterate over all
ImageAnnos <lost.db.model.ImageAnno> you can use the respective iterator
self.inp.img_annos <lost.pyapi.inout.Input.img_annos>. See the
listing below <aascripts-read-annos> for
an example.
for img_anno in self.inp.img_annos:
for twod_anno in img_anno.twod_annos:
self.logger.info('image path: {}, 2d_anno_data: {}'.format(img_anno.img_path, twod_anno.data)
Contexts to Store Files
There are three different contexts that can be used to store files that
should handled by your script. Each context is modeled as a specific
folder in the lost filesystem. In order to get the path to a context
call
self.get_path <lost.pyapi.script.Script.get_path>.
Listing 6 <aascripts-context-example>
shows an application of the
self.get_path <lost.pyapi.script.Script.get_path> in order to get the path to the instance context.
@TODO ::: #aascripts-context-example .literalinclude language="python" caption="Listing 6: Create a csv file and store this file to the instance context." emphasize-lines="16" ../../../backend/lost/pyapi/examples/pipes/sia/export_csv.py :::
There a three types of contexts that can be accessed: instance, pipe, static.
The instance context is only accessible by the current instance of your script. Each time a pipeline is started each script will get its own instance folder in the LOST filesystem. No other script in the same pipeline will access this folder.
If you like to exchange files among the script instances of a started
pipeline, you can choose the pipe context. When calling
self.get_path <lost.pyapi.script.Script.get_path> with context = 'pipe' you will get a path to a
folder that is available to all script instances of a pipeline instance.
The static context is a path to the pipeline project folder where
all script instances will have access to. In this way you can access
files that you have provided inside the
Pipeline Project <aapipelines-pipe-projects>. For example, if you like to load a pretrained machine
learning model inside of your script, you can put it into the pipeline
project folder and and access it via the static context:
path_to_model = self.get_path('pretrained_model.md5', context='static')
Logging
Each Script will have a its own
logger <lost.pyapi.script.Script.logger>. This logger is an instance of the standard python
logger. The
example below <aascripts-logging> shows
how to log an info message, a warning and an error. All logs are
redirected to a pipeline log file that can be downloaded via the
pipeline view inside the web gui.
self.logger.info('I am a info message')
self.logger.warning('I am a warning')
self.logger.error('An error occured!')
Script Errors and Exceptions
If an error occurs in your script, the traceback of the exception will be visible in the web gui, when clicking on the respective script in your pipeline. The error will also be automatically logged to the pipeline log file.
Script ARGUMENTS
The ARGUMENTS variable will be used to provide script arguments that
can be set during the start of a pipline within the web gui.
ARGUMENTS are defined as a dictionary of dictionaries. Each argument
dictionary has the keys value and help. As you can see in the
listing below <aascripts-arg-def> the
first argument is called my_arg its value is true and its help
text is A boolean argument.
ARGUMENTS = {'my_arg' : { 'value':'true',
'help': 'A boolean argument.'}
}
Within your script you can access the value of an argument with the
get_arg(...) <lost.pyapi.script.Script.get_arg> method as shown below.
if self.get_arg('my_arg').lower() == 'true':
self.logger.info('my_arg was true')
Script ENVS
The EVNS variable provides meta information for the
pipeline engine <lost-ecosystem-pipe-engine> by defining a list of environments (similar to conda
environments) where this script may be executed in. In this way you can
assure that a script will only be executed in environments where all
your dependencies are installed. All environments are installed in
workers <lost-ecosystem-pipe-engine>
that may execute your script. If many different environments are defined
within the ENVS list of a script, the pipeline engine will try to
assign the script to a worker in the same order as defined within the
ENVS list. So if a worker is online that has installed the first
environment in the list the pipeline engine will assign the script to
this worker. If no worker with the first environment is online, it will
try to assign the script to a worker with the second environment in the
list and so on. Listing 11 <aascripts-env-def> shows an example of the ENVS definition in a script that
may be executed in two different environments.
ENVS = ['lost', 'lost-cv']
Script RESOURCES
Sometimes a script will require all resources of a worker. And therefore
no other script should be executed in parallel by the worker that
executes your script. This is often the case if you train an AI model
and you need all GPU memory to do this. In those cases, you can define a
RESOURCES variable inside your python script and assign a list
containing the string lock_all to it. See the
listing below <aascripts-resources-def>
for an example:
RESOURCES = ['lock_all']
Debugging a Script
Most likely, if you imported your pipeline and run it for the first time some scripts will not work, since you placed some tiny bug into your code :-)
Inside the web GUI all exceptions and errors of your script will be visualized when clicking on the respective script element in the pipeline visualization. In this way you get a first hint what's wrong.
In order to debug your code you need to login to the docker container and find the instance folder that is created for each script instance. Inside this folder there is a bash script called debug.sh that need to be executed in order to start the pudb debugger. You will find your script by its unique pipeline element id. The path to the script instance folder will be /home/lost/app/debug/i-<pipe_element_id>.
# Log in to docker
docker exec -it lost bash
# Change directory to the instance path of your script
cd /home/lost/app/debug/i-<pipe_element_id>
# Start debugging
bash debug.sh
If your script requires a special ENV to be executed, you need to login to a container that has installed this environment for debugging.