How to set up a large scale project with Taskblaster?
All examples in web pages utilize the most basic building blocks of taskblaster,
such as the local tasks.py
file. Here we set up a complete Python project,
which is installable, and has your workflows. This also enables advanced properties
of Taskblaster such as custom object encoding and decoding.
Download and execute (in a new folder) the following script
,
which will create necessary files for a skeleton of a Python Taskblaster project
and a python virtual environment, where it pip installs it as editable.
You may also want to run this script step by step to ensure that each step works.
$ chmod +x setup_full_project.sh && ./setup_full_project.sh
Obtaining file:///home/myuser/tmprepo/tb_demo_venv/tb_demo
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Checking if build backend supports build_editable: started
Checking if build backend supports build_editable: finished with status 'done'
Getting requirements to build editable: started
Getting requirements to build editable: finished with status 'done'
Preparing editable metadata (pyproject.toml): started
Preparing editable metadata (pyproject.toml): finished with status 'done'
Collecting taskblaster (from tb_demo==0.1)
Downloading taskblaster-0.1-py3-none-any.whl.metadata (1.1 kB)
Collecting click (from taskblaster->tb_demo==0.1)
Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Downloading taskblaster-0.1-py3-none-any.whl (87 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87.2/87.2 kB 3.2 MB/s eta 0:00:00
Downloading click-8.1.7-py3-none-any.whl (97 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.9/97.9 kB 5.7 MB/s eta 0:00:00
Building wheels for collected packages: tb_demo
Building editable for tb_demo (pyproject.toml): started
Building editable for tb_demo (pyproject.toml): finished with status 'done'
Created wheel for tb_demo: filename=tb_demo-0.1-0.editable-py3-none-any.whl size=2628 sha256=8c56553de77f814f1aa9a5f093859f73c77a46543d1b7748bce2c26c9c5e06b4
Stored in directory: /tmp/pip-ephem-wheel-cache-le47v204/wheels/b5/b8/9b/b76a4ce638629191bca8e1be1bf94c75caf0932c84bff65bae
Successfully built tb_demo
Installing collected packages: click, taskblaster, tb_demo
Successfully installed click-8.1.7 taskblaster-0.1 tb_demo-0.1
The script will create a following file and directory structure
$ cd tb_demo_venv/tb_demo && find -type f | grep -v "egg-info"
./pyproject.toml
./tb_demo/mysubmodule/tasks.py
./tb_demo/mysubmodule/__init__.py
./tb_demo/__init__.py
./tb_demo/main_workflow.py
There is a Python module called tb_demo
in the folder with the same name.
It is identified as a module, because it has the __init__.py
file,
Taskblaster can find it, because it is installed, since we did call
pip install -e .
in its parent folder with the appropriate
setup.py
(which signifies an installable package).
Take a look of the pyproject.toml
of the project.
Note our package depends on Taskblaster, and thus running the pip install command will automatically install Taskblaster
to our virtual environment from pip
repository.
If you want to use your own gitlab cloned taskblaster, comment out the 'taskblaster'
dependency,
and create the virtual environment with --system-site-packages
to include your own taskblaster installation
(or alternativaly pip install -e .
taskblaster in your virtual environment as well.
We may now initialize Taskblaster with the projects module. Instead of tb init
we use
the syntax tb init [MODULE]
.
You can activate the virtual environment now:
source tb_demo_venv/bin/activate
$ tb init tb_demo
Created repository using module "tb_demo" in "/home/myuser/tmprepo".
Warning
Before writing tb init tb_demo
make sure to go to a fresh folder (do not create a Taskblaster
repository to your home folder, or to your virtual environment.
You may examine the tasks and the workflow. The new feature enabled by associating Taskblaster
with its designated project module is enabling of user encoding. Here we introduce the easier way,
which is to create Python classes with tb_decode
and tb_encode
methods. This is how one
can pass through any instances of classes as parameters to tasks, and as outputs of tasks.
$ cat tb_demo_venv/tb_demo/tb_demo/mysubmodule/__init__.py
import taskblaster as tb
class MyCustomObject:
def __init__(self, parameter):
self.parameter = parameter
def tb_encode(self):
return {'my_stored_parameter': self.parameter}
@classmethod
def tb_decode(cls, dct: dict):
return cls(parameter=dct['my_stored_parameter'])
def calculate(self):
return f'Calculated using {self.parameter}.'
@tb.workflow
class MyWorkflow:
@tb.task
def create_object(self):
return tb.node('tb_demo.mysubmodule.tasks.create_object')
@tb.task
def use_object(self):
return tb.node('tb_demo.mysubmodule.tasks.use_object', obj=self.create_object)
There are wide usecases for such custom classes. For example, one may use classes to contain parameters for calculations, but only make them actually perform certain steps of calculations. The tasks used in this demo reflect this ability in the simplest possible manner.
$ cat tb_demo_venv/tb_demo/tb_demo/mysubmodule/tasks.py
from tb_demo.mysubmodule import MyCustomObject
def create_object():
return MyCustomObject(parameter=42)
def use_object(obj):
return obj.calculate()
The custom encoding and decoding also allows much easier changing of parameters, as the user controls the encoring and decoding.
We can run the workflow.
$ tb workflow tb_demo_venv/tb_demo/tb_demo/main_workflow.py entry: add new 0/0 tree/create_object add new 0/1 tree/use_object
And execute it’s tasks.
$ tb run .
Starting worker rank=000 size=001
[rank=000 2024-08-19 12:59:46 N/A-0/1] Worker class: —
[rank=000 2024-08-19 12:59:46 N/A-0/1] Required tags: —
[rank=000 2024-08-19 12:59:46 N/A-0/1] Supported tags: —
[rank=000 2024-08-19 12:59:46 N/A-0/1] Main loop
Got task <taskblaster.worker.LoadedTask object at 0x7fc32c16eea0>
[rank=000 2024-08-19 12:59:46 N/A-0/1] Running create_object ...
[rank=000 2024-08-19 12:59:46 N/A-0/1] Task create_object finished in 0:00:00.001643
Got task <taskblaster.worker.LoadedTask object at 0x7fc32c1852e0>
[rank=000 2024-08-19 12:59:46 N/A-0/1] Running use_object ...
[rank=000 2024-08-19 12:59:46 N/A-0/1] Task use_object finished in 0:00:00.000293
[rank=000 2024-08-19 12:59:46 N/A-0/1] No available tasks, end worker main loop
It is perhaps insightful to look at the actual output.json
of the create_object
task.
$ cat tree/create_object/output.json
{"__tb_enc__": ["tb_demo.mysubmodule.MyCustomObject", {"my_stored_parameter": 42}]}
And we see how the module path to the class has been stored, and that is how Taskblaster will know how to search for the right encoder/decoder.