The from statement
Synopsis
from(<pattern>,<pattern>,...|[<pattern1>, <pattern2>, ...]) {
< statements to process files matching patterns >
}
from(<pattern>,<pattern>,...|[<pattern1>, <pattern2>, ...]) [filter|transform|produce](...) {
< statements to process files matching patterns >
}
Behavior
The from statement reshapes the inputs to be the most recent output file(s) matching the given pattern for the following block. This is useful when a task needs an input that was produced earlier in the pipeline than the previous stage, or other similar cases where your inputs don't match the defaults that Bpipe assumes.
Often a from would be embedded inside a produce, transform, or filter block, but that is not required. In such a case, from can be joined directly to the same block by preceding the transform or filter directly with the 'from' statement.
The patterns accepted by from are glob-like expression using *
to represent
a wildcard. A pattern with no wild card is treated as a file extension, so for
example "csv
" is treated as "*.csv
", but will only match the first (most
recent) CSV file. By contrast, using *.csv
directly will cause all
CSV files from the last stage that output a CSV file to match the first
parameter. This latter form is particularly useful for gathering all the files
of the same type output by different parallel stages.
When provided as a list, from will accumulate multiple files with different extensions. When multiple files match a single extension they are used sequentially each time that extension appears in the list given.
Note: using from
in a nested way (within an existing from
clause) is not supported and
may result in undefined behavior.
Examples
Use most recent CSV file to produce an XML file
create_excel = {
transform("xml") {
from("csv") {
exec "csv2xml $input > $output"
}
}
}
Use 2 text and CSV files to produce an XML file
// Here we are assuming that some previous stage is supplying
// two text files (.txt) and some stage (possibly the same, not necessarily)
// is supplying a CSV file (.csv).
create_excel = {
from("txt","txt","csv") {
exec "some_command $input1 $input2 $input3 > $output.xml" // input1 and input2 will be .txt, input3 will be .csv
}
}
Match all CSV files from the last stage that produces a XML file
from("*.csv") transform("xml") {
exec "csv2xml $inputs.csv > $output"
}