(WORK IN PROGRESS)
"just" facilitates the construction and management of bash based pipelines on a cluster by
- Grouping commands under a task name/id
- Sharing global variables among tasks (for example paths to common programs)
- Easily scheduling tasks as jobs on a cluster (supports job dependency for tasks with consecutive ids)
Reasons to use "just":
- Modularity and reusable code: shorter debug cycles
- Reproducibility: don't struggle with your own scripts 3 months from now.
- qsub logging
- STDOUT/STDERR are logged into files with meaningful names, indicating the task they belong to.
- qsub logs are synced to the master node
Usage:
- Define a sequence of indexed tasks in a file (here named 'tasks.just'):
0:shared_commands:{{
# anything written here is shared by all tasks at execution time.
A=1
}}
1:write:{{
echo $A >> $workdir/1.txt
# the variable $A is known here since it is defined in task 0
# $workdir should be defined by the user at the command line
}}
2:read:{{
cat $workdir/1.txt
}}
- Execute on current machine: just.py tasks.just -s 1-2 --workdir test_just
- Schedule on a cluster: just.py tasks.just -s 1-2 --workdir test_just --q $QUEUE_NAME (e.g. -q '*@@nlp' on ND's CRC)