Bpipe Version 0.9.9.7
Download: bpipe-0.9.9.7.tar.gz
Summary
This release includes several major new features including prelimary support for running Bpipe pipelines on cloud providers (Google Cloud, Amazon Web Services), a new merge point operator for making it easier to construct parallel pipelines using scatter-gather parallelism. In addition to these, significant work has been done to dramatically improve performance and reduce resource consumption on very highly parallel pipelines with large numbers of input / output files.
Features
-
Preliminary support for executing pipelines on Google Cloud Services (Compute Engine) and mounting storage for pipelines from Google Cloud Storage
-
Preliminary support for executing pipelines on Amazon Web Services using EC2 and mounting storage for pipelines from S3
-
The 'groovy' command can now run embedded groovy (executed outside Bpipe) using the groovy runtime bundled with Bpipe
-
Support aliasing to string values in addition to outputs
-
Experimental support for beforeRun hook in command config: execute arbitrary groovy code before a command executes
-
Many performance improvements, esp. for large, highly parallel pipelines
-
Support configuration for number of retries for status polling of HPC jobs (statusPollRetries setting)
-
Support for 'optional' inputs in pipelines: to make input optional, suffix with 'optional'. Also can add 'flag' to add flags in commands eg: ${input.csv.optional.flag('--csv')}
-
New operator: merge point operator (>>>) automatically configures a stage to merge outputs from a previous parallel split
-
Add region.bedFlag(flag) method for convenience when passing regions to commands
-
'var' expressions may now be added in the main pipeline script, not just pipeline stages. These define optional variables, and provide a default.
-
JMS support now responds to 'ping' message with 'pong' reply if JMS 'Reply-To' is set to allow for status monitoring
Fixes
-
Fix incorrect "abnormal termination" messages printed to console when pipeline stopped with 'bpipe stop'
-
Fix incorrect 'pre-existing' printed for outputs that were created by pipeline
-
Fix genome not accessible in pipeline the first time downloaded, printing error
-
Re-execute checks if a commmand in the same stage has executed
-
synchronize initialization of dir watcher to fix sporadic ConcurrentModificationExceptions
-
Fix empty embedded parallel stage list causing resolution of incorrect downstream input
-
Fix leak of 'var' variables across branches when 'using' applied to pipeline stage
-
Fix error if 4 or more arguments passed to "to" in transform
-
Fix bpipe complaining spurious outputs not created on retry, but not original run
-
Fix some bugs where branch names were not being observed
-
Fix branch name sometimes inserted without separating period for transforms
-
Avoid redundantly putting branch name into files
-
Improved detail in error / log messages in a few places
-
Fix missing branch and '..' in filenames
-
Change: globally defined variables must now be held constant once pipeline starts
-
Fix split regions not stable between runs, set region id as branch name
-
Fix bed.split producing different splits if run repeatedly on same bed
-
Fix errors output if SLF4J referenced in user loaded libraries
-
Fix npe / improve error message when filter used with mismatching output ext
-
Fix error in stage body resulting in confusing 'no associated storage' assertion failure
-
Add 'allowForeign' option to 'from' to let it process non-outputs
-
Lessen the retries and retry interval when file cannot be cleaned up