Advanced Settings and Variables

Introduction

Bpipe supports a number of advanced settings that can be used to solve problems in particular environments. This page is a catch all for miscellaneous settings that are not found in other sections.

Settings

Output Scan Concurrency

If your pipeline produces very many input or output files, you may find that it pauses for a long time at particular points. This is because Bpipe needs to verify that every single one of those files exists. Just doing that check can take a long time if file system calls have a lot of latency - as they can on some file systems such as remote mounted NFS partitions. To improve performance you can have Bpipe execute file scans in parallel. To enable this, set the outputScanConcurrency parameter in your bpipe.config file. eg:

outputScanConcurrency=10

This will cause Bpipe to use up to 10 threads in parallel to scan the file system. Do not raise this value too high on systems where allocation of file handles is restricted, because each thread consumes a file handle of its own.

Job Launch Separation

By default Bpipe will launch jobs with almost no separation in time. That is, if you have 3000 commands that can run concurrently in a scheduling system, Bpipe may try to submit 3000 jobs in the space of a few milliseconds. This can create a spike in load which is not very friendly to the queuing system, and in some cases it can even result in failures if the system becomes overloaded as it tries to digest so many new jobs. It can help in these cases to introduce a delay in between the launch of each job. To do this, add the following setting to your .bpipeconfig or bpipe.config file:

jobLaunchSeparationMs=3000

The above example would space every command by at least 3 seconds - if you really have 3000 to jobs to launch this will make your whole pipeline take 9000 seconds to get started, so you will need to balance the value of this setting against the capabilities and robustness of the system the jobs are running on.

File Watcher Setting

By default Bpipe uses inotify to allow it to be advised of file system updates efficiently. This minimises the load Bpipe itself puts on the file system since it does not need to explicitly check to know when files are updated.

Unfortunately inotify has per-user limits which can mean for large or complex pipelines, or if you run very many pipeline at once, you may receive errors about the inotify limit being exceeded.

If this happens, you can fall back Bpipe using manual file scanning to monitor file timestamps instead, by setting in the bpipe.config file:

usePollerFileWatcher=true

Post Command Hook

If you want to run something every single time after each command finishes, you can set it as a "post command" using the post_cmd configuration. For example, to print the date and time when every command completes, in bpipe.config, you can put:

post_cmd="""
echo "Command finished at: `date`"
"""

Note: this is not supported by every executor, but is supported by most, including the local executor.

Special Variables

Sometimes it is helpful for your pipeline script to know variables about its environment. The following table defines variables that are available to pipeline scripts:

Variable Meaning
bpipe.Config.scriptDirectory The directory in which the currently running pipeline script is situated