Hello World

The first thing to do is to create a Taskblaster repository. This can be done with tb init command:

$ tb init
Created repository using module "taskblaster.repository" in "/home/myuser/tmprepo".

This will create a directory called tree/ which eventually will contain all of the tasks. It will also create a hidden .taskblaster directory. This directory contains the registry database where metadata, such as the state of each task, is stored for efficient access. The user should never edit any files in this directory on their own.

You may view several other important paths using the tb info command:

$ tb info
Module:     taskblaster.repository
Code:       /home/docs/checkouts/readthedocs.org/user_builds/taskblaster/envs/latest/lib/python3.12/site-packages/taskblaster/repository.py
Root:       /home/myuser/tmprepo
Tree:       /home/myuser/tmprepo/tree
Registry:   /home/myuser/tmprepo/.taskblaster/registry.db (0 entries)
Pythonpath: /home/myuser/tmprepo/src
Tasks:      /home/myuser/tmprepo/tasks.py (not created)
Resources:  /home/myuser/tmprepo/resources.py (not created)
Read only:  False

A simple workflow with a single task

Create a file called workflow.py:

import taskblaster as tb


@tb.workflow
class Workflow:
    greeting = tb.var(default='hello')
    whom = tb.var()

    @tb.task
    def hello(self):
        return tb.node('greet', greeting=self.greeting, whom=self.whom)


def workflow(runner):
    runner.run_workflow(Workflow(whom='world'))

Here the class Workflow is the workflow that will be executed. It contains a single task hello which will run the Python function greet with the kwargs greeting and whom defined by the input to the workflow. The function greet is assumed to be located in a file tasks.py in the main working directory. The function workflow defines that the Workflow class should be executed using the default argument for greeting and whom=’world’.

The next step is thus to create the file called tasks.py with the function greet.

def greet(greeting, whom):
    return f'{greeting}, {whom}!'

You can now run the workflow:

$ tb workflow workflow.py
entry:                    add  new      0/0        tree/hello 

This generates some tasks. To see at any time what tasks are there, use:

$ tb ls
state    info       tags        worker         time     folder
──────── ────────── ─────────── ─────────── ─────────── ─────────────────────────────
new      0/0                                            tree/hello

You can see that a task hello has been created and added to the tree.

Use tb view <path to view> to see more detailed information about tasks:

$ tb view .
name: hello
  location:        /home/myuser/tmprepo/tree/hello
  state:           new
  target:          greet(…)
  wait for:        0 dependencies
  depth:           0
  source workflow: <root workflow>
  frozen by: (not frozen)

  latest handled inputs:
     None

  handlers:
     <None>

  handler data:
    <None>

  parents:
    <task has no dependencies>

  input:
    ["greet", {"greeting": "hello", "whom": "world"}]

  output:
    <task not finished yet>

Task has no run information
No custom actions defined for this task.

Here you can e.g. see the status of the task and what input arguments that were provided. Note that the task has only been added to the tree and has not yet been executed (it is in the new state).

To execute the task you need to run it using the tb run command:

$ tb run .
Starting worker rank=000 size=001
[rank=000 2025-03-17 10:08:36 N/A-0/1] Worker class: —
[rank=000 2025-03-17 10:08:36 N/A-0/1] Required tags: —
[rank=000 2025-03-17 10:08:36 N/A-0/1] Supported tags: —
[rank=000 2025-03-17 10:08:36 N/A-0/1] name: None
    tags: —
    required_tags: —
    resources: None
    max_tasks: None
    subworker_size: None
    subworker_count: None
    wall_time: None
[rank=000 2025-03-17 10:08:36 N/A-0/1] Main loop
[rank=000 2025-03-17 10:08:36 N/A-0/1] Running hello ...
[rank=000 2025-03-17 10:08:36 N/A-0/1] Task hello finished in 0:00:00.000880
[rank=000 2025-03-17 10:08:36 N/A-0/1] No available tasks, end worker main loop

ls shows it is now in the done state:

$ tb ls
state    info       tags        worker         time     folder
──────── ────────── ─────────── ─────────── ─────────── ─────────────────────────────
done     0/0                    N/A-0/1        00:00:00 tree/hello

You can now view the output:

$ tb view .
name: hello
  location:        /home/myuser/tmprepo/tree/hello
  state:           done
  target:          greet(…)
  wait for:        0 dependencies
  depth:           0
  source workflow: <root workflow>
  frozen by: (not frozen)

  latest handled inputs:
     None

  handlers:
    []

  handler data:
    <None>

  parents:
    <task has no dependencies>

  input:
    ["greet", {"greeting": "hello", "whom": "world"}]

  output:
    'hello, world!'

Run information:
    Worker name: N/A-0/1
    Start time: 2025-03-17 10:08:36
    End time: 2025-03-17 10:08:36
    Duration: 0:00:00
    Error: None

No custom actions defined for this task.

Congratulations, you finished your first small TaskBlaster workflow!

Adding new Tasks

To add a new task to an existing workflow tree we edit the original workflow.py script and save it to a new file workflow2.py.

import taskblaster as tb


@tb.workflow
class Workflow:
    greeting = tb.var(default='hello')
    whom = tb.var()
    username = tb.var(default='User')

    @tb.task
    def hello(self):
        return tb.node('greet', greeting=self.greeting, whom=self.whom)

    @tb.task
    def hello_user(self):
        return tb.node('greet', greeting=self.greeting, whom=self.username)


def workflow(runner):
    runner.run_workflow(Workflow(whom='world', username='Tara'))

You can now see what happens when you run the workflow:

$ tb workflow workflow2.py
entry:                   have  done     0/0        tree/hello 
                          add  new      0/0        tree/hello_user 

Try running ls:

$ tb ls
state    info       tags        worker         time     folder
──────── ────────── ─────────── ─────────── ─────────── ─────────────────────────────
done     0/0                    N/A-0/1        00:00:00 tree/hello
new      0/0                                            tree/hello_user

As you can see, the old task hello is still done, but there is a new task in the new state in the tree.

Run the new task

$ tb run .
Starting worker rank=000 size=001
[rank=000 2025-03-17 10:08:37 N/A-0/1] Worker class: —
[rank=000 2025-03-17 10:08:37 N/A-0/1] Required tags: —
[rank=000 2025-03-17 10:08:37 N/A-0/1] Supported tags: —
[rank=000 2025-03-17 10:08:37 N/A-0/1] name: None
    tags: —
    required_tags: —
    resources: None
    max_tasks: None
    subworker_size: None
    subworker_count: None
    wall_time: None
[rank=000 2025-03-17 10:08:37 N/A-0/1] Main loop
[rank=000 2025-03-17 10:08:37 N/A-0/1] Running hello_user ...
[rank=000 2025-03-17 10:08:37 N/A-0/1] Task hello_user finished in 0:00:00.000860
[rank=000 2025-03-17 10:08:37 N/A-0/1] No available tasks, end worker main loop
$ tb ls
state    info       tags        worker         time     folder
──────── ────────── ─────────── ─────────── ─────────── ─────────────────────────────
done     0/0                    N/A-0/1        00:00:00 tree/hello
done     0/0                    N/A-0/1        00:00:00 tree/hello_user

The task is now marked as done and you can view the output with tb view.

Creating a conflict

We will now investigate what happens if we change the input to the workflow. Change the input argument whom in workflow2.py

import taskblaster as tb


@tb.workflow
class Workflow:
    greeting = tb.var(default='hello')
    whom = tb.var()
    username = tb.var(default='User')

    @tb.task
    def hello(self):
        return tb.node('greet', greeting=self.greeting, whom=self.whom)

    @tb.task
    def hello_user(self):
        return tb.node('greet', greeting=self.greeting, whom=self.username)


def workflow(runner):
    runner.run_workflow(Workflow(whom='new world', username='Tara'))

and save it as a new file workflow3.py.

Now run the workflow

$ tb workflow workflow3.py
entry:               conflict  done     0/0 ❄ C    tree/hello 
                         have  done     0/0        tree/hello_user 

You are informed that there is a conflict for the task hello. You can also view the information about the conflict using the ls command

$ tb ls -c sfcC
state    folder                        conflict    conflict info
──────── ───────────────────────────── ─────────── ───────────────
done     tree/hello                    conflict    Input changed. Old input
 ["greet", {"greeting": "hello", "whom": "world"}]
 New input:
 ["greet", {"greeting": "hello", "whom": "new world"}]
done     tree/hello_user               none        No conflict

The additional argument -c sfcC specifies that we want to see the state, directory, conflict state and conflict info (try tb ls --help to see all options). From the output we can see which state that has a conflict and what the previous input to the function was. Note that the state of the task is still done, so no output files have been deleted. The conflict state is merely information to the user on which tasks that are affected by the change of input parameters. We can choose to change the conflict state to resolved, meaning that we have noticed that there is a conflict but want to continue to do calculations for this task based on the old input parameters.

$ tb resolve tree/hello
$ tb ls -csfcC
state    folder                        conflict    conflict info
──────── ───────────────────────────── ─────────── ───────────────
done     tree/hello                    resolved    Input changed. Old input
 ["greet", {"greeting": "hello", "whom": "world"}]
 New input:
 ["greet", {"greeting": "hello", "whom": "new world"}]
done     tree/hello_user               none        No conflict

The conflict state has now changed to resolved. If we change our minds we can mark it as conflict again

$ tb unresolve tree/hello
$ tb ls -csfcC
state    folder                        conflict    conflict info
──────── ───────────────────────────── ─────────── ───────────────
done     tree/hello                    conflict    Input changed. Old input
 ["greet", {"greeting": "hello", "whom": "world"}]
 New input:
 ["greet", {"greeting": "hello", "whom": "new world"}]
done     tree/hello_user               none        No conflict

If we decide that we still want to go with the old input parameters we can run the old workflow again.

$ tb workflow workflow2.py
entry:                   have  done     0/0        tree/hello 
                         have  done     0/0        tree/hello_user 
$ tb ls -csfcC
state    folder                        conflict    conflict info
──────── ───────────────────────────── ─────────── ───────────────
done     tree/hello                    none        No conflict
done     tree/hello_user               none        No conflict

and we can see that the conflict has disappeared.

However, suppose we want to run the task with the new input. We then have to unrun the task to first put it in the new state. This will delete the output from the task.

$ tb unrun tree/hello --force
unrun:  done     hello
1 task were unrun.
$ tb ls -csfcC
state    folder                        conflict    conflict info
──────── ───────────────────────────── ─────────── ───────────────
new      tree/hello                    none        No conflict
done     tree/hello_user               none        No conflict

The task is now in the new state. Rerun workflow3 to apply the new input parameters:

$ tb workflow workflow3.py
entry:                 update  new      0/0        tree/hello 
                         have  done     0/0        tree/hello_user 

You can now see that the conflict has disappeared and the state has changed to new. With tb view you can also verify that the output has been deleted. It is now possible to run the task with the new input parameters:

$ tb run tree/hello
Starting worker rank=000 size=001
[rank=000 2025-03-17 10:08:38 N/A-0/1] Worker class: —
[rank=000 2025-03-17 10:08:38 N/A-0/1] Required tags: —
[rank=000 2025-03-17 10:08:38 N/A-0/1] Supported tags: —
[rank=000 2025-03-17 10:08:38 N/A-0/1] name: None
    tags: —
    required_tags: —
    resources: None
    max_tasks: None
    subworker_size: None
    subworker_count: None
    wall_time: None
[rank=000 2025-03-17 10:08:38 N/A-0/1] Main loop
[rank=000 2025-03-17 10:08:38 N/A-0/1] Running hello ...
[rank=000 2025-03-17 10:08:38 N/A-0/1] Task hello finished in 0:00:00.000886
[rank=000 2025-03-17 10:08:38 N/A-0/1] No available tasks, end worker main loop

and verify that the input as well as output has been updated

$ tb view tree/hello
name: hello
  location:        /home/myuser/tmprepo/tree/hello
  state:           done
  target:          greet(…)
  wait for:        0 dependencies
  depth:           0
  source workflow: <root workflow>
  frozen by: (not frozen)

  latest handled inputs:
     None

  handlers:
    []

  handler data:
    <None>

  parents:
    <task has no dependencies>

  input:
    ["greet", {"greeting": "hello", "whom": "new world"}]

  output:
    'hello, new world!'

Run information:
    Worker name: N/A-0/1
    Start time: 2025-03-17 10:08:38
    End time: 2025-03-17 10:08:38
    Duration: 0:00:00
    Error: None

No custom actions defined for this task.