Using scatter to run fastqc_check
step in the workflow where input is an array. Trying to capture all the results to one file. When the output file is kept as File, only the results from last file in the input array gets stored in the output file.
fastqc_check_out:
type: File[]
outputSource: fastqc_check/fastqc_check_out
Would like to keep all the results from different input files in the input array to one output file
When the outputs/fastqc_check_out
is kept as a File
, it throws error. Is there a way to do it
cat qc.cwl . Workflow
cwlVersion: v1.0
class: Workflow
requirements:
- class: ScatterFeatureRequirement
inputs:
reads1:
type: File[]
reads2:
type: File[]
fastqc_check_script:
type: File
sample:
type: string
outputs:
fastqc_out:
type: File[]
outputSource: fastqc/fastqc_zip
fastqc_html:
type: File[]
outputSource: fastqc/fastqc_html
fastqc_check_out:
type: File[]
outputSource: fastqc_check/fastqc_check_out
steps:
fastqc:
run: fastqc.cwl
in:
fq1:
source: reads1
fq2:
source: reads2
out: [fastqc_zip, fastqc_html]
fastqc_check:
run: fastqc_check.cwl
in:
sample: sample
fq1_zips: fastqc/fastqc_zip
scatter: fq1_zips
out: fastqc_check_out
cat fastqc_check.cwl
cwlVersion: v1.0
class: CommandLineTool
baseCommand: [sh, fastqc_check.sh]
inputs:
sample:
type: string
inputBinding:
position: 1
fq1_zips:
type: File
inputBinding:
position: 2
outputs:
fastqc_check_out:
type: File
outputBinding:
glob: $(inputs.sample)_fastqc.summary
cat fastqc.cwl
cwlVersion: v1.0
class: CommandLineTool
baseCommand: [fastqc, -o, .]
inputs:
fq1:
type: File[]
inputBinding:
position: 1
fq2:
type: File[]
inputBinding:
position: 2
outputs:
fastqc_zip:
type: File[]
outputBinding:
glob: '*.zip'
fastqc_html:
type: File[]
outputBinding:
glob: '*.html'
Yes, true. The scatter creates multiple jobs and each job output gets written to different file, although the output file is the same. Each time a file gets created, overwriting the previous one.
As a solution, writing the output of the scatter to different output files and then another step to
cat
the results of those output files to a single output file.cat fastqc_check.cwl
cat fastqc_summarize.cwl
cat qc.cwl