How to set up a large scale project with Taskblaster?

All examples in web pages utilize the most basic building blocks of taskblaster, such as the local tasks.py file. Here we set up a complete Python project, which is installable, and has your workflows. This also enables advanced properties of Taskblaster such as custom object encoding and decoding.

Download and execute (in a new folder) the following script, which will create necessary files for a skeleton of a Python Taskblaster project and a python virtual environment, where it pip installs it as editable. You may also want to run this script step by step to ensure that each step works.

$ chmod +x setup_full_project.sh && ./setup_full_project.sh
Obtaining file:///home/myuser/tmprepo/tb_demo_venv/tb_demo
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable: started
  Checking if build backend supports build_editable: finished with status 'done'
  Getting requirements to build editable: started
  Getting requirements to build editable: finished with status 'done'
  Preparing editable metadata (pyproject.toml): started
  Preparing editable metadata (pyproject.toml): finished with status 'done'
Collecting taskblaster (from tb_demo==0.1)
  Downloading taskblaster-0.1-py3-none-any.whl.metadata (1.1 kB)
Collecting click (from taskblaster->tb_demo==0.1)
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Downloading taskblaster-0.1-py3-none-any.whl (87 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87.2/87.2 kB 3.2 MB/s eta 0:00:00
Downloading click-8.1.7-py3-none-any.whl (97 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.9/97.9 kB 5.7 MB/s eta 0:00:00
Building wheels for collected packages: tb_demo
  Building editable for tb_demo (pyproject.toml): started
  Building editable for tb_demo (pyproject.toml): finished with status 'done'
  Created wheel for tb_demo: filename=tb_demo-0.1-0.editable-py3-none-any.whl size=2628 sha256=8c56553de77f814f1aa9a5f093859f73c77a46543d1b7748bce2c26c9c5e06b4
  Stored in directory: /tmp/pip-ephem-wheel-cache-le47v204/wheels/b5/b8/9b/b76a4ce638629191bca8e1be1bf94c75caf0932c84bff65bae
Successfully built tb_demo
Installing collected packages: click, taskblaster, tb_demo
Successfully installed click-8.1.7 taskblaster-0.1 tb_demo-0.1

The script will create a following file and directory structure

$ cd tb_demo_venv/tb_demo && find -type f | grep -v "egg-info"
./pyproject.toml
./tb_demo/mysubmodule/tasks.py
./tb_demo/mysubmodule/__init__.py
./tb_demo/__init__.py
./tb_demo/main_workflow.py

There is a Python module called tb_demo in the folder with the same name. It is identified as a module, because it has the __init__.py file, Taskblaster can find it, because it is installed, since we did call pip install -e . in its parent folder with the appropriate setup.py (which signifies an installable package).

Take a look of the pyproject.toml of the project. Note our package depends on Taskblaster, and thus running the pip install command will automatically install Taskblaster to our virtual environment from pip repository. If you want to use your own gitlab cloned taskblaster, comment out the 'taskblaster' dependency, and create the virtual environment with --system-site-packages to include your own taskblaster installation (or alternativaly pip install -e . taskblaster in your virtual environment as well.

We may now initialize Taskblaster with the projects module. Instead of tb init we use the syntax tb init [MODULE].

You can activate the virtual environment now:

source tb_demo_venv/bin/activate
$ tb init tb_demo
Created repository using module "tb_demo" in "/home/myuser/tmprepo".

Warning

Before writing tb init tb_demo make sure to go to a fresh folder (do not create a Taskblaster repository to your home folder, or to your virtual environment.

You may examine the tasks and the workflow. The new feature enabled by associating Taskblaster with its designated project module is enabling of user encoding. Here we introduce the easier way, which is to create Python classes with tb_decode and tb_encode methods. This is how one can pass through any instances of classes as parameters to tasks, and as outputs of tasks.

$ cat tb_demo_venv/tb_demo/tb_demo/mysubmodule/__init__.py
import taskblaster as tb

class MyCustomObject:
    def __init__(self, parameter):
        self.parameter = parameter

    def tb_encode(self):
        return {'my_stored_parameter': self.parameter}

    @classmethod
    def tb_decode(cls, dct: dict):
        return cls(parameter=dct['my_stored_parameter'])

    def calculate(self):
        return f'Calculated using {self.parameter}.'

@tb.workflow
class MyWorkflow:
    @tb.task
    def create_object(self):
        return tb.node('tb_demo.mysubmodule.tasks.create_object')

    @tb.task
    def use_object(self):
        return tb.node('tb_demo.mysubmodule.tasks.use_object', obj=self.create_object)

There are wide usecases for such custom classes. For example, one may use classes to contain parameters for calculations, but only make them actually perform certain steps of calculations. The tasks used in this demo reflect this ability in the simplest possible manner.

$ cat tb_demo_venv/tb_demo/tb_demo/mysubmodule/tasks.py
from tb_demo.mysubmodule import MyCustomObject

def create_object():
    return MyCustomObject(parameter=42)

def use_object(obj):
    return obj.calculate()

The custom encoding and decoding also allows much easier changing of parameters, as the user controls the encoring and decoding.

We can run the workflow.

$ tb workflow tb_demo_venv/tb_demo/tb_demo/main_workflow.py
entry:         add new      0/0   tree/create_object 
               add new      0/1   tree/use_object 

And execute it’s tasks.

$ tb run .
Starting worker rank=000 size=001
[rank=000 2024-08-19 12:59:46 N/A-0/1] Worker class: —
[rank=000 2024-08-19 12:59:46 N/A-0/1] Required tags: —
[rank=000 2024-08-19 12:59:46 N/A-0/1] Supported tags: —
[rank=000 2024-08-19 12:59:46 N/A-0/1] Main loop
Got task <taskblaster.worker.LoadedTask object at 0x7fc32c16eea0>
[rank=000 2024-08-19 12:59:46 N/A-0/1] Running create_object ...
[rank=000 2024-08-19 12:59:46 N/A-0/1] Task create_object finished in 0:00:00.001643
Got task <taskblaster.worker.LoadedTask object at 0x7fc32c1852e0>
[rank=000 2024-08-19 12:59:46 N/A-0/1] Running use_object ...
[rank=000 2024-08-19 12:59:46 N/A-0/1] Task use_object finished in 0:00:00.000293
[rank=000 2024-08-19 12:59:46 N/A-0/1] No available tasks, end worker main loop

It is perhaps insightful to look at the actual output.json of the create_object task.

$ cat tree/create_object/output.json
{"__tb_enc__": ["tb_demo.mysubmodule.MyCustomObject", {"my_stored_parameter": 42}]}

And we see how the module path to the class has been stored, and that is how Taskblaster will know how to search for the right encoder/decoder.