Question

CWL to check the output directory and run for non-existing files

0

Entering edit mode

6.0 years ago

a.james ▴ 240

Hello All,

I have a CWL script which should merge the graphs files produced from the previous step. I need t=CWL to check the output directory and merge those graphs. My CWL script looks like following, . The input is an array of BAM files.

I need CWL command line tool to go check the existing output directory and execute step to merge all the already generated files within the output directory. But now it is not doing it rather it is starting from the begin, that is , from the step to generate each graph for each BAM file. Which is processing while time consuming.

cwlVersion: v1.0 class: CommandLineTool doc: Spladder

baseCommand: [python2.7, /usr/python/spladder.py]

hints:
  cwltool:InplaceUpdateRequirement:
    inplaceUpdate: true
requirements:
 - class: InlineJavascriptRequirement
 - class: InitialWorkDirRequirement
   listing: 
    - entry: "$({class: 'Directory', listing: []})"
      entryname: $(inputs.spladder_outDir)
      writable: true

inputs:
 spladder_gtf: 
  type: File
  inputBinding:
   position: 3
   prefix: -a
 spladder_bams: 
  type: File[]
  inputBinding:
   position: 1
   prefix: -b
  secondaryFiles: .bai
 spladder_outDir:
  type: string
  inputBinding:
   position: 2
   prefix: -o
 spladder_phase2:
  type: string
  inputBinding:
   position: 6
   prefix: -T
 spladder_merge_graphs:
  type: string
  inputBinding:
    position: 5
    prefix: -M
 spladder_primary_alignment:
  type: string
  inputBinding:
    position: 10
    prefix: -P
 spladder_confidence:
  type: int
  inputBinding:
    position: 4
    prefix: -c
 spladder_alt:
  type: string
  inputBinding:
    position: 7
    prefix: -t
 spladder_validate:
  type: string
  inputBinding:
    position: 8
    prefix: -V
 spladder_RL:
  type: int
  inputBinding:
    position: 9
    prefix: -n

outputs:
 spladder_out:
  type: Directory
  outputBinding:
   glob: $(inputs.spladder_outDir)/spladder

$namespaces:
  cwltool: http://commonwl.org/cwltool#

And the YML file used for the above script looks like following,

spladder_gtf: 
 class: File
 path: /usage_examples/gencode.v19.annotation.hs37d5_chr.spladder.gtf
spladder_outDir:/Alignment/spladder_out/
spladder_out_dir1: /spladder_out1
spladder_out_dir2: /spladder_out2
spladder_bams: [
 {class: File, path: /Alignment/C3N-02289_10_L1Aligned.sortedByCoord.out.bam},
 {class: File, path: /Alignment/C3N-02289_4_5_L1Aligned.sortedByCoord.out.bam},
 {class: File, path: /cluster/work/grlab/projects/alva_temp/Alignment/C3N-02671_08_L1Aligned.sortedByCoord.out.bam}
]
spladder_confidence: 2
spladder_merge_graphs: merge_graphs
spladder_alt: alt_3prime
spladder_RL: 100
spladder_phase2: y
spladder_primary_alignment: y

And I ran the cal tool as,

 cwltool --enable-ext /spladder_part1.cwl /part2.yml

Now my aim is that the CWL tool looks into spladder_outDir and just merge the existing outputs from the previous run/step. Currently the spladder_outDir has 17 graph files and I need CWL to merge them together. As in the parameter spladder_merge_graphs: But on contrary the CWL is staring from the beginning creating all graphs if no absolute path is given if an absolute is given then it says,

FileExistsError: [Errno 17] File exists: '/spladder_out/spladder'

if not then,

WARNING: Output directory ./spladder_out does not exist - will be created

Any helps or suggestion would be great I read the CWL Manuel end-to end couple of times I saw

cwltool:InplaceUpdateRequirement:
    inplaceUpdate: true

and --enable-ext both of them are providing the right the right solution

If I run it otherwise then the processing time is three times more. That why I wanted to do the merging part as second separate run.

CWL RNA-seq next-gen • 2.6k views

ADD COMMENT • link 5.7 years ago by a.james ▴ 240

score 1 · Answer 1 · 2019-01-23

Hi! If your problem still exists i would very much like to help. However, i am not sure if i understood what your tool is supposed to do. Probably because i don't know anything about spladder. Is it correct that the "previous step" you mentioned is part of a workflow and the Tool you posted here only has the purpose of merging the files?

I am by no means an expert in CWL. That being said, i am not sure InitialWorkdirRequirement can be used in the way you you are attempting for this tool.

You might instead try giving subdirectories of runtime.outdir (the temporary output directory cwl uses during runtime) to spladder as input parameters for its output directory. That way you still know exactly where your files are during runtime, so you can catch the ones you need with glob. This might look like:

[...]
requirements:
 - class: InlineJavascriptRequirement

arguments:
  -  valueFrom: $(runtime.outdir+"/spladder_output")
     prefix: -o
     position: 2

inputs:
[...]
REMOVE spladder_outDir FROM INPUTS
[...]
outputs:
 spladder_out:
  type: Directory
  outputBinding:
   glob: $(runtime.outdir+"/spladder_output")
[...]

I don't know how the output of spladder will look. Let's say its a bunch of ".example"-files, which spladder puts into a subdirectory called "blurb". Then you might alternatively catch the output as an array of files using.

outputs:
  spladder_out:
    type: File[]
    outputBinding:
      glob: $(runtime.outdir+"/spladder_output/blurb/*.example")

Please write if this still produces problems or if i misunderstood the issue altogether. Regards, Tom