Resources
TaskBlaster provides various management tools so you can allocate the jobs that you want to run when, where, and how you want, handle failures, and manage different resource requirements of various tasks.
In this tutorial, we will cover how to submit jobs to the TaskBlaster queue, how to tag tasks, and finally, the different ways you can execute the tasks e.g. submitting jobs to the slurm queue or interactively.
Handling states
TaskBlaster tasks have their own set of states to help manage . You can
view all available states using the tb stat
command.
$ tb init
Created repository using module "taskblaster.repository" in "/home/myuser/tmprepo".
$ tb stat new 0 queue 0 run 0 done 0 fail 0 partial 0 cancel 0
Tasks in the new
state can have their inputs updated. Tasks in the queue
state are able to be picked up by workers. You can place tasks into the queue
by specifying the following command:
$ tb submit
state deps tags folder
────────────────────────────────────────────────────────
Submitted 0 tasks
Once a task has been picked up by a worker, the state is updated to run
.
If the jobs fails, TaskBlaster automatically updates the state to fail
with all dependencies updated to the cancel
state. If the task is
successful, the state is updated to done
. The state partial
is reserved
for dynamical tasks. This indicates dynamical tasks which have started and
partially completed but have stopped before reaching the “finalizer” stage.
Unrunning a task in the state fail
will put the task into the new
state
along with any tasks in the cancel
state dependent on it.
Submitting
The first task management tool is the TaskBlaster “queue”. In order for a job
to be picked up by a worker, the task must be in the “queue” state. You
can see the state of a task with tb ls <tree/or/tree/path>
. Changing the
state of a task from new
to queue
is easily achieved by simply executing
the following command:
tb submit <path>
from within the TaskBlaster repository. Now the task is in the state queue
.
Once a TaskBlaster worker is submitted and starts, the worker will be able
to see the task and pick it up.
Note
The TaskBlaster queue is completely separate from the SLURM queue. Furthermore, unless tasks are tagged and a worker is submitted with a specific tag, tasks picked up by the workers are not associated with any specific SLURM job. Tasks are only assigned to a worker once the SLURM job starts and the worker picks up a task.
Tagging tasks
Tagging is a versatile way of pairing specific tasks or set of tasks with specific workers during submission time when submitting a worker to the slurm queue.
Managing Failures
If you have a specific type of error related to resources allocation you can targe that Failure string as follows:
tb tag -F FAILURE_STRING <tree/> --add TAG_FAILURE_NAME
and executing
tb tag | grep TAG_FAILURE_NAME
will list all the tasks with that tag.
To remove a tag from a task, you can specify the --untag
argument
tb tag -F FAILURE_STRING <tree/> --untag TAG_FAILURE_NAME
Targeting specific tasks
You can target specific tasks to tag by specifying the path to the task’s directory name:
tb tag <tree/**/target_taskname> --add TAG_LABEL
where **
will match all files and directories in the current directory and
subdirectories, locating all target_taskname
tasks along the path.
Another way to utilize tags to manage resources is discussed below in the workers section.
Executing and submitting tasks
There are 2 ways to run a task in TaskBlaster.
tb run <path>
which will launch a worker to execute queued tasks.tb workers submit
interfaces with MyQueue to submit slurm jobs.
1.1 Interactively run tasks
The first run mode can be broken down into 2 distinct run modes. You can
think of this as an interactive mode you will executed from the terminal. From
within the TaskBlaster repo, you would simply run tb run <path>
and all
queued tasks on that path will be picked up. The logging information is
dumped to the screen. This mode is useful for very fast tasks or if you are
trying to debug code and using ‘breakpoint()’.
1.2 Submitting workers via slurm scripts
The most familiar, albeit manual, way to submit a TaskBlaster worker would be to create a slurm script that loads the python environment, then calls the same srun command from within the top level tree directory of an existing TaskBlaster repository. All the tasks which are in a queued state are picked up and executed on an HPC resource by that worker. One such example slurm script is provided below.
#!/bin/bash -l
#SBATCH -J <job_name>
#SBATCH --output=<job_output>-%j.out
#SBATCH --time=0-01:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --partition=partition_name
source /active/python/env/command
cd /path/to/root/tree
srun tb run .
This process is time consuming as one would have to manually edit the submission script every time the resources/worker specification/directory needs to be changed. A more streamlined way to submit jobs using TaskBlaster is to utilize TaskBlasters MyQueue interface, described below.
2.1 Setup myqueue
TaskBlaster is integrated with MyQueue, and with a bit of configuration you can efficiently use TaskBlasters MyQueue interface to handle the submission of workers to various HPC resource queues.
After installing MyQueue, execute the command:
mq config
which detects and prints a configuration file for your HPC resources
to the terminal.
You should copy and paste the text into .myqueue/config.py
.
MyQueue provides detailed documentation
on configuration.
This allows the user to specify via the command line using the TaskBlaster UI
resource specifications with -R
that is forwarded to MyQueue.
Now, when submitting a job with TaskBlaster, it will use MyQueue as well as
displaying convenient runtime information obtained from MyQueue in the tb
ls
command to report job statuses and failures, such as the information
listed under the worker
and time
columns below:
state deps tags worker time folder
───────────────────────────────────────────────────────────────────────────────
done 0/0 7654463-3/4 00:00:00 tree/Ag/material
done 1/1 7654902-0/4 00:00:00 tree/Ag/niggli_normalize
done 1/1 7654902-0/4 00:00:01 tree/Ag/reduce_to_primitive
2.2 Submitting workers
With MyQueue set up workers can be submitted by specifying the resource
flag -R
, followed by MyQueue resources as follows:
tb workers submit -R 8:xeon24:10h
An additional configuration file can be specified for associating workers with resources. You can think of this as a persistent resource-tag allocation, giving TaskBlaster a way to associated tags with resources during submission time.
To set this up you can execute the command:
tb workers config
which will display the location of the resources.py
file used to define
worker configurations. An example configuration file is shown below:
resources = {
'alberich1': {'tags': {'hello'}},
'alberich2': {'required_tags': {'eggs', 'spam'}},
'mq_alberich1': {'tags': {'hello'}, 'resources': '1:alberich1:1m'},
'superworker': {'tags': {'hello', 'highmem', 'gpu'}},
'parallel_hello': {'resources': '40:hello:1h'},
'parallel': {'resources': '40:1h'},
'lowmem': {'tags': {'memoryerror'}, 'resources': '1:alberich1:1m'},
'highmem': {},
}
And you can submit workers with resources specified in the resources.py
file
as follows:
tb workers submit parallel_hello
Subworkers
You can run multiple tasks within a single SLURM job using
TaskBlaster’s subworker commands. These options allow you to partition
available resources by either specifying the number of CPUs
per subworker with --subworker-size=num
or setting the number of concurrent subworkers
with --subworker-count=num
.
The use of this command requires special communicator setup and care must be taken when calling specific commands as nested srun commands are not allowed. You can specify how many subworkers/resources in the following way:
tb workers submit -R 8:xeon24:10h --subworker-size=4
or:
tb workers submit -R 8:xeon24:10h --subworker-count=2
where these 2 commands are equivalent specifying 2 subworkers, each with 4 processors per “worker” or SLURM job.