README
¶
batch-vs-runner
Description
CLI tool for running virtual screening (or other batch processing on chemical structures) on multiple processors by creating batches and running them in parallel.
Usage of batch-vs-runner: batch-vs-runner [FLAGS] [SD|PDB|PDBQT|MOL2|DIRECTORY]...
-batchEnd int
end at Nth molecule, 0 means all molecules
-batchSize int
batch size (default 100)
-batchStart int
start from Nth molecule (cumulative across all input files) (default 1)
-delay int
delay a certain amount of time (in ms) between spawning the next process, useful for programs that periodically do heavy IO
-enableSlurm
detect slurm allocations based on environment variable and use srun to run jobs (default true)
-exec string
command to execute in worker (default "./job.sh")
-lineBreak string
linebreak for output structure: unix, dos, or mac (default "unix")
-np int
no. of worker processes (does not apply if slurm mode is in use) (default 1)
-prefix string
prefix on individual job work directory (default "job")
-slurmNodeTaskOverride string
override how many tasks to distribute to each node from the env received from slurm
-verbose
pass through worker script output to terminal
-workspace string
path to job setup files (can be a directory or single file) (default ".")
-workspaceOnly
generate workspace only but do not execute any job, you can use anything to execute the job once the workspace has been compiled
Get Started
-
Create a folder as the "template" for each batch's workspace. During runtime, the program will automatically generate a workspace for each batch. You can put files that you want to copy to all workspaces (configuration files, batch scripts, etc.) here. Additionally, the batch of molecules for each job will be generated by the program automatically, named "job.sd", "job.sdf", "job.mol2" depending on input file extension. See Execution Environment part for details on how to write the template workspace.
-
Execute "batch-vs-runner" with corresponding flags, format is Go standard lib style
-key=value. Examples:-workspace=path/to/my_workspaceWorkspace template is at pathpath/to/my_workspace-np=2020 parallel processes-verbose=truepass through worker script output to terminal-delay=1000delay 1000ms before starting the next process during initialization.-batchSize=10override batch size to 10 molecules-batchEnd=100end at the 100th molecule (cumulative across all input files specified)
Full examples:
./batch-vs-runner -np=30 -workspace=my_dock_job_template -batchSize=50 my_library.sdfSplitmy_library.sdfinto 50-molecule batches and generate a workspace just likemy_dock_job_templatefolder for each batch. Runjob.shin each batch with 30 parallel processes../batch-vs-runner -workspace=my_dock_job_template -batchEnd=100 -batchSize=100 -workspaceOnly=true my_library.sdfSplitmy_library.sdfinto 100-molecule batches, ending at the 100th molecule, and generate a workspace just likemy_dock_job_templatefolder. Only generate workspace but do not executejob.sh. You cancdinto the work directory and do whatever you want. Mainly used for testing and debugging.
Execution Environment
Template folder
- Files will preserve their path relative to the template folder when they are compiled, so
workspace/some_dir/file.txtwill be copied tojob_*_*/some_dir/file.txtupon execution. File modes will also be copied, exception is common executable files such as.sh.bash.runwill be automatically added executable permission when they are compiled to the workspace. - Files with
.tplextension will be processed through Go text/template system, and they will be executed with.Context filled with the batch definition for each batch job. Seeexample/gold/example.txt.tplas an example. - A
job.<ext>file will automatically be generated containing the molecules belonging to the batch.<ext>ismol2sdsdfpdbpdbqtdepending on input molecule format. - I recommend not leave empty folders in template directory. If you want to explicitly create an empty folder, use
mkdirinjob.shortouch .keep > template/empty_dir
HPC environment with slurm
This program can automatically parse environment variables set by slurm and distribute jobs to the nodes allocated. (files won't be transferred automatically as of now, so must be run on a shared storage). No extra configuration needed.
To exilicitly disable this behavior (a.k.a.) do not use srun and run all job shell files on master node, use -enableSlurm=false.
Use flag -slurmNodeTaskOverride to override how many tasks to distribute to each node. Format is comma-separated list of numbers or numbers plus (xN) where N denotes the same configuration for N nodes.
job.sh file
The default command to execute for each batch job is bash -c ./job.sh. Thus, just add a script called job.sh in the template folder and it will be run automatically during runtime. Call your docking software in job.sh and ask it to dock file job.sdf, job.mol2 etc. depending on your input molecule type.
NOTE: The work directory for each batch script is the batch folder, so if you have a software.conf or software.conf.tpl in your job template, the correct way to refer to that file in job script is just software.conf or ./software.conf. If you want to override this bahavior, use cd in your job.sh
Examples
See examples/ folder for some example workspace templates.
Documentation
¶
There is no documentation for this package.