CWL - glob an array of output files based on a single input string
0
0
Entering edit mode
4.7 years ago
drkennetz ▴ 560

Hi all, I am writing a CWL command line tool for fastp which has the option to take an input fastq or fastq pair and split each fastq by number of lines which is beneficial for an embarrassingly parallel workflow. So I want to write a tool that has the option to require fastq1 as input, but also optionally pass fastq2 as input (because you can trim in single end or paired end mode). This would mean I optionally have output that can be a file or an array. I have done this with input fastqs, but I am not sure how to do it in the output. So for example if I had fastq1 called myfastq_R1_.fastq.gz that had 20,000,000 reads, and I named the output fastq myfastq_R1_.trimmed.fastq.gz and I split that by 2,000,000 lines each, the outputs would be named 001.myfastq_R1_.trimmed.fastq.gz, 002.myfastq_R1_.trimmed.fastq.gz... up until 010.myfastq_R1_.trimmed.gz if I split the fastq. Optionally, I could add a second fastq and do the same thing. The glob statement is the confusing part for globbing outputs because it could be the specified string (myfastq_R1_.trimmed.fastq.gz) or it could be that filename with a number increment at the beginning. Here is my tool currently:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool

baseCommand: [fastp]

label: fastp adapter trimmer
doc: |
  fastp -i <fastq> -o <fastq_out> -I <fastq2?> -O <fastq2_out?> args.

inputs:

  ##################
  # Required input #
  ##################

  fastq:
    type: File
    inputBinding:
      prefix: -i
    doc: -i FILE    read1 input file name

  fastq_out:
    type: string
    inputBinding:
      prefix: -o
    doc: -o STRING  read1 output file name

  fastq2:
    type: File?
    inputBinding:
      prefix: -I
    doc: -I FILE    read2 input file name

  fastq2_out:
    type: string?
    inputBinding:
      prefix: -O
    doc: -O STRING  read2 output file name

  split_by_lines:
    type: int?
    inputBinding:
      prefix: -S
    doc: -S INT     split output by limiting total lines of each file. output will be named 001.fqname.fastq, 002.fqname.fastq...

outputs:
  trimmed_fastq:
    type: 
       - type: File
       - type: File[]
      glob: (could be $(inputs.fastq_out) or could be 001.$(inputs.fastq_out), 002.$(inputs.fastq_out)... n.$(inputs.fastq_out))

  trimmed_fastq2:
    type:
      - type: File?
      - type: array
          items: ["null", File]
       glob: (could not exist, or could possibly meet the same conditions are fastq1)
cwl • 1.1k views
ADD COMMENT
0
Entering edit mode

cwl is now using this discourse site for support. May want to post there.

ADD REPLY
0
Entering edit mode

Ah thanks, should I delete the post?

ADD REPLY
0
Entering edit mode

Someone may respond here but if you want to you could.

ADD REPLY
0
Entering edit mode

Thanks, I have posted there but I am currently on hold of being approved by a moderator. I will delete this if my post gets approved. I appreciate the redirect!

ADD REPLY

Login before adding your answer.

Traffic: 2166 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6