Hi all, I am writing a CWL command line tool for fastp which has the option to take an input fastq or fastq pair and split each fastq by number of lines which is beneficial for an embarrassingly parallel workflow. So I want to write a tool that has the option to require fastq1 as input, but also optionally pass fastq2 as input (because you can trim in single end or paired end mode). This would mean I optionally have output that can be a file or an array. I have done this with input fastqs, but I am not sure how to do it in the output. So for example if I had fastq1 called myfastq_R1_.fastq.gz
that had 20,000,000 reads, and I named the output fastq myfastq_R1_.trimmed.fastq.gz
and I split that by 2,000,000 lines each, the outputs would be named 001.myfastq_R1_.trimmed.fastq.gz
, 002.myfastq_R1_.trimmed.fastq.gz
... up until 010.myfastq_R1_.trimmed.gz
if I split the fastq. Optionally, I could add a second fastq and do the same thing. The glob
statement is the confusing part for globbing outputs because it could be the specified string (myfastq_R1_.trimmed.fastq.gz
) or it could be that filename with a number increment at the beginning. Here is my tool currently:
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
baseCommand: [fastp]
label: fastp adapter trimmer
doc: |
fastp -i <fastq> -o <fastq_out> -I <fastq2?> -O <fastq2_out?> args.
inputs:
##################
# Required input #
##################
fastq:
type: File
inputBinding:
prefix: -i
doc: -i FILE read1 input file name
fastq_out:
type: string
inputBinding:
prefix: -o
doc: -o STRING read1 output file name
fastq2:
type: File?
inputBinding:
prefix: -I
doc: -I FILE read2 input file name
fastq2_out:
type: string?
inputBinding:
prefix: -O
doc: -O STRING read2 output file name
split_by_lines:
type: int?
inputBinding:
prefix: -S
doc: -S INT split output by limiting total lines of each file. output will be named 001.fqname.fastq, 002.fqname.fastq...
outputs:
trimmed_fastq:
type:
- type: File
- type: File[]
glob: (could be $(inputs.fastq_out) or could be 001.$(inputs.fastq_out), 002.$(inputs.fastq_out)... n.$(inputs.fastq_out))
trimmed_fastq2:
type:
- type: File?
- type: array
items: ["null", File]
glob: (could not exist, or could possibly meet the same conditions are fastq1)
cwl
is now using this discourse site for support. May want to post there.Ah thanks, should I delete the post?
Someone may respond here but if you want to you could.
Thanks, I have posted there but I am currently on hold of being approved by a moderator. I will delete this if my post gets approved. I appreciate the redirect!