Input type array and Output type file, possible in CWL ?
1
0
Entering edit mode
6.4 years ago
ttom ▴ 220

Using scatter to run fastqc_check step in the workflow where input is an array. Trying to capture all the results to one file. When the output file is kept as File, only the results from last file in the input array gets stored in the output file.

 fastqc_check_out:
  type: File[]
  outputSource: fastqc_check/fastqc_check_out

Would like to keep all the results from different input files in the input array to one output file When the outputs/fastqc_check_out is kept as a File, it throws error. Is there a way to do it

cat qc.cwl . Workflow

  cwlVersion: v1.0
    class: Workflow
    requirements:
     - class: ScatterFeatureRequirement

inputs:
 reads1:
  type: File[]
 reads2:
  type: File[]
 fastqc_check_script:
  type: File
 sample:
  type: string
outputs:
 fastqc_out:
  type: File[]
  outputSource: fastqc/fastqc_zip
 fastqc_html:
  type: File[]
  outputSource: fastqc/fastqc_html
 fastqc_check_out:
  type: File[]
  outputSource: fastqc_check/fastqc_check_out

steps:
 fastqc:
  run: fastqc.cwl
  in:
   fq1:
    source: reads1
   fq2:
    source: reads2
  out: [fastqc_zip, fastqc_html]
 fastqc_check:
  run: fastqc_check.cwl
  in:
   sample: sample
   fq1_zips: fastqc/fastqc_zip
  scatter: fq1_zips
  out: fastqc_check_out

cat fastqc_check.cwl

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [sh, fastqc_check.sh]

inputs:
 sample:
  type: string
  inputBinding:
   position: 1
 fq1_zips:
  type: File
  inputBinding:
   position: 2
outputs:
 fastqc_check_out:
  type: File
  outputBinding:
   glob: $(inputs.sample)_fastqc.summary

cat fastqc.cwl

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [fastqc, -o, .]

inputs:
 fq1: 
  type: File[]
  inputBinding:
   position: 1
 fq2:
  type: File[]
  inputBinding:
   position: 2
outputs:
 fastqc_zip:
  type: File[]
  outputBinding:
   glob: '*.zip'
 fastqc_html:
  type: File[]
  outputBinding:
   glob: '*.html'
CWL • 2.6k views
ADD COMMENT
2
Entering edit mode
6.4 years ago

Hi, seems to me like you are doing everything right regarding CWL. Possible problem could be that when you scatter fastqc_check.cwl, two jobs are created and each job outputs the file with the same name $(inputs.sample)_fastqc.summary. Try to change the fastqc_check.cwl tool to output some other file name based on the input file name, e.g. $(inputs.fq1_zips.nameroot).$(inputs.sample)_fastqc.summary or something like that, just to make sure that the two files have different names.

If you want to merge summary outputs, then you have to either modify fastqc_check.cwl to take a list and merge outputs in some way or to create a third tool in the end that would merge the output array.

ADD COMMENT
1
Entering edit mode

Yes, true. The scatter creates multiple jobs and each job output gets written to different file, although the output file is the same. Each time a file gets created, overwriting the previous one.

As a solution, writing the output of the scatter to different output files and then another step to cat the results of those output files to a single output file.

cat fastqc_check.cwl

class: CommandLineTool
baseCommand: [sh, fastqc_check.sh] 

inputs:
 fq1_zips:
  type: File
  inputBinding:
   position: 1
outputs:
 fastqc_check_out:
  type: File
  outputBinding:
   glob: $(inputs.fq1_zips.nameroot).summary

cat fastqc_summarize.cwl

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [cat]

inputs:
 sample: string
 fq1_summary:
  type: File[]
  inputBinding:
   position: 1
outputs:
 fastqc_summarize_out:
  type: stdout
stdout: $(inputs.sample)_fastqc.summary

cat qc.cwl

cwlVersion: v1.0
class: Workflow


requirements:
 - class: ScatterFeatureRequirement

inputs:
 reads1:
  type: File[]
 reads2:
  type: File[]
 fastqc_check_script:
  type: File
 sample:
  type: string
outputs:
 fastqc_out:
  type: File[]
  outputSource: fastqc/fastqc_zip
 fastqc_html:
  type: File[]
  outputSource: fastqc/fastqc_html
 fastqc_check_out:
  type: File[]
  outputSource: fastqc_check/fastqc_check_out
 fastqc_summary_out:
  type: File
  outputSource: fastqc_summarize/fastqc_summarize_out
steps:
 fastqc:
  run: fastqc.cwl
  in:
   fq1:
    source: reads1
   fq2:
    source: reads2
  out: [fastqc_zip, fastqc_html]
 fastqc_check:
  run: fastqc_check_temp.cwl
  in:
   fq1_zips:
    source: [fastqc/fastqc_zip]
  scatter: fq1_zips
  out: [fastqc_check_out]
 fastqc_summarize:
  run: fastqc_summarize.cwl
  in:
   sample: sample
   fq1_summary:
    source: fastqc_check/fastqc_check_out
  out: [fastqc_summarize_out]
ADD REPLY

Login before adding your answer.

Traffic: 1122 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6