Hello All,
I have a CWL script which should merge the graphs files produced from the previous step. I need t=CWL to check the output directory and merge those graphs. My CWL script looks like following, . The input is an array of BAM files.
I need CWL command line tool to go check the existing output directory and execute step to merge all the already generated files within the output directory. But now it is not doing it rather it is starting from the begin, that is , from the step to generate each graph for each BAM file. Which is processing while time consuming.
cwlVersion: v1.0 class: CommandLineTool doc: Spladder
baseCommand: [python2.7, /usr/python/spladder.py] hints: cwltool:InplaceUpdateRequirement: inplaceUpdate: true requirements: - class: InlineJavascriptRequirement - class: InitialWorkDirRequirement listing: - entry: "$({class: 'Directory', listing: []})" entryname: $(inputs.spladder_outDir) writable: true inputs: spladder_gtf: type: File inputBinding: position: 3 prefix: -a spladder_bams: type: File[] inputBinding: position: 1 prefix: -b secondaryFiles: .bai spladder_outDir: type: string inputBinding: position: 2 prefix: -o spladder_phase2: type: string inputBinding: position: 6 prefix: -T spladder_merge_graphs: type: string inputBinding: position: 5 prefix: -M spladder_primary_alignment: type: string inputBinding: position: 10 prefix: -P spladder_confidence: type: int inputBinding: position: 4 prefix: -c spladder_alt: type: string inputBinding: position: 7 prefix: -t spladder_validate: type: string inputBinding: position: 8 prefix: -V spladder_RL: type: int inputBinding: position: 9 prefix: -n outputs: spladder_out: type: Directory outputBinding: glob: $(inputs.spladder_outDir)/spladder $namespaces: cwltool: http://commonwl.org/cwltool#
And the
YML
file used for the above script looks like following,spladder_gtf: class: File path: /usage_examples/gencode.v19.annotation.hs37d5_chr.spladder.gtf spladder_outDir:/Alignment/spladder_out/ spladder_out_dir1: /spladder_out1 spladder_out_dir2: /spladder_out2 spladder_bams: [ {class: File, path: /Alignment/C3N-02289_10_L1Aligned.sortedByCoord.out.bam}, {class: File, path: /Alignment/C3N-02289_4_5_L1Aligned.sortedByCoord.out.bam}, {class: File, path: /cluster/work/grlab/projects/alva_temp/Alignment/C3N-02671_08_L1Aligned.sortedByCoord.out.bam} ] spladder_confidence: 2 spladder_merge_graphs: merge_graphs spladder_alt: alt_3prime spladder_RL: 100 spladder_phase2: y spladder_primary_alignment: y
And I ran the cal tool as,
cwltool --enable-ext /spladder_part1.cwl /part2.yml
Now my aim is that the CWL tool looks into spladder_outDir
and just merge the existing outputs from the previous run/step. Currently the spladder_outDir
has 17 graph files and I need CWL to merge them together. As in the parameter spladder_merge_graphs:
But on contrary the CWL is staring from the beginning creating all graphs if no absolute path is given if an absolute is given then it says,
FileExistsError: [Errno 17] File exists: '/spladder_out/spladder'
if not then,
WARNING: Output directory ./spladder_out does not exist - will be created
Any helps or suggestion would be great I read the CWL Manuel end-to end couple of times I saw
cwltool:InplaceUpdateRequirement:
inplaceUpdate: true
and --enable-ext
both of them are providing the right the right solution
If I run it otherwise then the processing time is three times more. That why I wanted to do the merging part as second separate run.
@Tom Thanks for your time and reply. I will take a look into your solution. I tried this solution, but it is not giving out what I need