The Transform Statement

transform(<transform name>) {
    < statements to transform inputs >


transform(<input file pattern>,...) to(replacement pattern, ...)  {
    < statements to transform inputs >



Transform is a convenient alias for produce where the name of the output or outputs is deduced from the name of the input(s) by modifying the file extension. For example, if you have a command that converts a CSV file called foo.csv to an XML file, you can easily declare a section of your script to output foo.xml using a transform with the name 'xml'.

The output(s) that are automatically deduced by transform will inherit all the behavior implied by the produce statement.

Since version, transform has offered an extended form that allows you to do more than just replace the file extension. This form uses two parts, taking the form:

transform(<input file pattern>) to(<output file pattern>) { ... }

The input and output patterns are assumed to match to the end of the file name, but can include a regular expression pattern for matching the input files.

Note: input file patterns that contain no regular expression characters, or that end in "." followed by plain characters are treated as file extensions. ie: ".xml" is treated as literal ".xml", not "any character followed by xml".

Note: when the form '*.ext' is used, all inputs with the given extension are matched and their transforms become expected outputs. By contrast, the form '.ext' or 'ext' causes only the first input matching the extension '.ext' to generate an expected output. Specifying a single extension multiple times does not map multiple inputs with that extension. Instead use the wildcard form to match multiple inputs.


You can also declare a whole pipeline stage as a transform by adding the Transform annotation prior to the stage in the form @Transform(<filter name>). This form is a bit less flexible, but more concise when you don't need the flexibility.


Remove Comment Lines from CSV File

transform("xml") {
  exec """
    csv2xml $input > $output

Run FastQC on a Gzipped FASTQ file

Fastqc produces output files following an unusual convention for naming. To match this convention, we can use the extended form of transform:

fastqc = {
    transform('.fastq.gz') to('') {
        exec "fastqc -o . --noextract $inputs"
    forward input

Note also that since the output zip files from FastQC are usually not used downstream, we forward the input files rather than the default of letting the output files be forwarded to the next stage.

Gunzip Two Files Using a Wildcard Pattern

    transform('*.fastq.gz') to('.fastq') {
        exec """
            gunzip -cf $input1.fastq.gz > $output1.fastq

            gunzip -cf $input2.fastq.gz > $output2.fastq

Here the pattern sets a rule for all input files ending with .fastq.gz, which are all mapped so that .fastq.gz is replaced with .fastq.