Merge points
Synopsis
<branch definition> * [ <stage1>,<stage2>,...] >>> <merge stage>
Behavior
Defines a stage that is identified as a merge point for a preceding set of parallel stages.
This is nearly the same as using the +
operator, however it causes Bpipe to name the outputs
of the merge stage differently. Specifically, ordinarily, the merge stage would name
its output according to the first input by default. However this often leads to misleadingly
named outputs that appear to be derived only from the first parallel branch of the parallel segment.
When the mergepoint operator is used, Bpipe will still derive the output name from the first input,
however it will excise the branch name from the file name of that input and replace it with "merge",
so that the output is clearly identified as a merge of previous inputs.
The merge point operator is particularly useful when dynamic branching constructs are used such that you cannot anticipate exactly what the branch names will be beforehand.
Examples
Merge Outputs from Three Pipeline Branches Together
Here a pipeline branches three ways with with branches called
foo
, bar
and baz
. If the >>>
operator was not used, the final output
would end with foo.there.world.xml
. However because the merge point
operator is applied, the final output ends with .merge.there.world.xml
hello = {
exec """
cp -v $input.txt $output.csv
"""
}
there = {
exec """
cp -v $input.csv $output.tsv
"""
}
world = {
exec """
cat $inputs.tsv > $output.xml
"""
}
run {
hello + ['foo','bar','baz'] * [ there ] >>> world
}
Split hg19 500 ways and merge the results back together
Note that here we make use of Bpipe's automatic region splitting and magic
$region.bed
variable.
genome 'hg19'
compute_gc_content = {
exec """gc_content -L $region.bed $input.txt > $output.gc.txt"""
}
calculate_mean = {
exec """
cat $inputs.txt | gngs 'println(graxxia.Stats.mean())' > $output.mean.txt
"""
}
run {
hg19.split(500) [ compute_gc_content ] + calculate_mean
}