Bpipe Version 0.9.9.7

Download: bpipe-0.9.9.7.tar.gz

Summary

This release includes several major new features including prelimary support for running Bpipe pipelines on cloud providers (Google Cloud, Amazon Web Services), a new merge point operator for making it easier to construct parallel pipelines using scatter-gather parallelism. In addition to these, significant work has been done to dramatically improve performance and reduce resource consumption on very highly parallel pipelines with large numbers of input / output files.

Features

  • Preliminary support for executing pipelines on Google Cloud Services (Compute Engine) and mounting storage for pipelines from Google Cloud Storage

  • Preliminary support for executing pipelines on Amazon Web Services using EC2 and mounting storage for pipelines from S3

  • The 'groovy' command can now run embedded groovy (executed outside Bpipe) using the groovy runtime bundled with Bpipe

  • Support aliasing to string values in addition to outputs

  • Experimental support for beforeRun hook in command config: execute arbitrary groovy code before a command executes

  • Many performance improvements, esp. for large, highly parallel pipelines

  • Support configuration for number of retries for status polling of HPC jobs (statusPollRetries setting)

  • Support for 'optional' inputs in pipelines: to make input optional, suffix with 'optional'. Also can add 'flag' to add flags in commands eg: ${input.csv.optional.flag('--csv')}

  • New operator: merge point operator (>>>) automatically configures a stage to merge outputs from a previous parallel split

  • Add region.bedFlag(flag) method for convenience when passing regions to commands

  • 'var' expressions may now be added in the main pipeline script, not just pipeline stages. These define optional variables, and provide a default.

  • JMS support now responds to 'ping' message with 'pong' reply if JMS 'Reply-To' is set to allow for status monitoring

Fixes

  • Fix incorrect "abnormal termination" messages printed to console when pipeline stopped with 'bpipe stop'

  • Fix incorrect 'pre-existing' printed for outputs that were created by pipeline

  • Fix genome not accessible in pipeline the first time downloaded, printing error

  • Re-execute checks if a commmand in the same stage has executed

  • synchronize initialization of dir watcher to fix sporadic ConcurrentModificationExceptions

  • Fix empty embedded parallel stage list causing resolution of incorrect downstream input

  • Fix leak of 'var' variables across branches when 'using' applied to pipeline stage

  • Fix error if 4 or more arguments passed to "to" in transform

  • Fix bpipe complaining spurious outputs not created on retry, but not original run

  • Fix some bugs where branch names were not being observed

  • Fix branch name sometimes inserted without separating period for transforms

  • Avoid redundantly putting branch name into files

  • Improved detail in error / log messages in a few places

  • Fix missing branch and '..' in filenames

  • Change: globally defined variables must now be held constant once pipeline starts

  • Fix split regions not stable between runs, set region id as branch name

  • Fix bed.split producing different splits if run repeatedly on same bed

  • Fix errors output if SLF4J referenced in user loaded libraries

  • Fix npe / improve error message when filter used with mismatching output ext

  • Fix error in stage body resulting in confusing 'no associated storage' assertion failure

  • Add 'allowForeign' option to 'from' to let it process non-outputs

  • Lessen the retries and retry interval when file cannot be cleaned up