Bpipe Version 0.9.9.7

Summary

This release includes several major new features including prelimary support for running Bpipe pipelines on cloud providers (Google Cloud, Amazon Web Services), a new merge point operator for making it easier to construct parallel pipelines using scatter-gather parallelism. In addition to these, significant work has been done to dramatically improve performance and reduce resource consumption on very highly parallel pipelines with large numbers of input / output files.

Features

Preliminary support for executing pipelines on Google Cloud Services (Compute Engine) and mounting storage for pipelines from Google Cloud Storage
Preliminary support for executing pipelines on Amazon Web Services using EC2 and mounting storage for pipelines from S3
The 'groovy' command can now run embedded groovy (executed outside Bpipe) using the groovy runtime bundled with Bpipe
Support aliasing to string values in addition to outputs
Experimental support for beforeRun hook in command config: execute arbitrary groovy code before a command executes
Many performance improvements, esp. for large, highly parallel pipelines
Support configuration for number of retries for status polling of HPC jobs (statusPollRetries setting)
Support for 'optional' inputs in pipelines: to make input optional, suffix with 'optional'. Also can add 'flag' to add flags in commands eg: ${input.csv.optional.flag('--csv')}
New operator: merge point operator (>>>) automatically configures a stage to merge outputs from a previous parallel split
Add region.bedFlag(flag) method for convenience when passing regions to commands
'var' expressions may now be added in the main pipeline script, not just pipeline stages. These define optional variables, and provide a default.
JMS support now responds to 'ping' message with 'pong' reply if JMS 'Reply-To' is set to allow for status monitoring

Fixes

Fix incorrect "abnormal termination" messages printed to console when pipeline stopped with 'bpipe stop'
Fix incorrect 'pre-existing' printed for outputs that were created by pipeline
Fix genome not accessible in pipeline the first time downloaded, printing error
Re-execute checks if a commmand in the same stage has executed
synchronize initialization of dir watcher to fix sporadic ConcurrentModificationExceptions
Fix empty embedded parallel stage list causing resolution of incorrect downstream input
Fix leak of 'var' variables across branches when 'using' applied to pipeline stage
Fix error if 4 or more arguments passed to "to" in transform
Fix bpipe complaining spurious outputs not created on retry, but not original run
Fix some bugs where branch names were not being observed
Fix branch name sometimes inserted without separating period for transforms
Avoid redundantly putting branch name into files
Improved detail in error / log messages in a few places
Fix missing branch and '..' in filenames
Change: globally defined variables must now be held constant once pipeline starts
Fix split regions not stable between runs, set region id as branch name
Fix bed.split producing different splits if run repeatedly on same bed
Fix errors output if SLF4J referenced in user loaded libraries
Fix npe / improve error message when filter used with mismatching output ext
Fix error in stage body resulting in confusing 'no associated storage' assertion failure
Add 'allowForeign' option to 'from' to let it process non-outputs
Lessen the retries and retry interval when file cannot be cleaned up

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search