Entering edit mode
5.8 years ago
a.james
▴
240
Dear All,
I am using CWL toil-cwl-runner for two samples on my CWL workflow. The workflow is running forever, without any clear error logs and error message in the log files.
This is how the main_parell.cwl
where all the sample logistics and the parallelising are defined.
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
requirements:
- class: ScatterFeatureRequirement
- class: SubworkflowFeatureRequirement
- class: InlineJavascriptRequirement
inputs:
reads1:
type: File[]
reads2:
type: File[]
sample_names:
type: string[]
genomeDir:
type: Directory
sjdbGTFfile:
type: File
annotation:
type: File
bam:
type: File
exp_out:
type: string
outputfile:
type: File
genome:
type: File
outputs:
bam_dir:
type: Directory
outputSource: collect/bam_dir
count_dir:
type: Directory
outputSource: collect/count_dir
steps:
pipeline_workflow:
run: workflow.cwl
scatter: [reads1, reads2, sample_name]
scatterMethod: dotproduct
in:
reads1: reads1
reads2: reads2
sample_name: sample_names
genomeDir: genomeDir
sjdbGTFfile: sjdbGTFfile
annotation: annotation
bam: bam
exp_out: exp_out
genome: genome
outputfile: outputfile
out: [alignment_out, expression_out]
collect:
in:
bam_files:
source: [pipeline_workflow/alignment_out]
linkMerge: merge_flattened
count_files:
source: [pipeline_workflow/expression_out]
linkMerge: merge_flattened
out: [bam_dir, count_dir]
run:
class: ExpressionTool
id: "collect_step"
inputs:
bam_files: File[]
count_files: File[]
outputs:
bam_dir: Directory
count_dir: Directory
expression: |
${
return {
"bam_dir": {
"class": "Directory",
"basename": "bams",
"listing": inputs.bam_files
},
"count_dir": {
"class": "Directory",
"basename": "counts",
"listing": [].concat.apply([], inputs.count_files)
}
};
}
The pipeline workflow.cwl
looks as follows,
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
doc: "Workflow Components: alignment -> count expression -> lib-size"
requirements:
- class: ScatterFeatureRequirement
- class: SubworkflowFeatureRequirement
- class: InlineJavascriptRequirement
inputs:
reads1:
type: File
reads2:
type: File
genome:
type: File
genomeDir:
type: Directory
#outSAMattrRGline: string
sjdbGTFfile:
type: File
annotation:
type: File
bam:
type: File
exp_out:
type: string
sample_name:
type: string
outputs:
alignment_out:
type: File
outputSource: star/star_bam
expression_out:
type: File
outputSource: expressioncount/expression_out
steps:
star:
run: star.cwl
in:
genomeDir: genomeDir
reads1: reads1
reads2: reads2
sjdbGTFfile: sjdbGTFfile
outFileNamePrefix: sample_name
runThreadN:
default: 4
outFilterMultimapScoreRange:
default: 1
out: [star_bam]
expressioncount:
run: count_expression.cwl
in:
annotation: annotation
bam: star/star_bam
exp_out: exp_out
out: [expression_out]
And then, the yml
file looks as following,
reads1: # array of type "File"
- class: File
path: sample1_R1.fastq.gz
- class: File
path: sample2_R1.fastq.gz
reads2: # array of type "File"
- class: File
path: sample1_R2.fastq.gz
- class: File
path: sample2_R2.fastq.gz
sample_names: # array of type "File"
- sample1
- sample2
#outSAMattrRGline: ID::M45ZB
genomeDir:
class: Directory
path: hg19_hs37d5.overhang100_STAR
genome:
class: File
path: /genome.fa
sjdbGTFfile:
class: File
path: gencode.v28lift37.annotation.gtf
samples:
- class: File
path: sample1.counts
- class: File
path: sample2.counts
annotation:
class: File
path: gencode.v19.annotation.hs37d5_chr.gtf
bam:
- class: File
path: sample1.bam
- class: File
path: sample2.bam
exp_out:
- sample1.counts
- sample2.counts
The issue is toil cwl runner is running with any proper error logs and failed messages. Any help is appreciated . I ran the toil-cal like following,
toil-cwl-runner --stats --clusterStats --retryCount=0 --batchSystem=lsf --disableCaching --tmpdir-prefix ${TMP_DIR} --tmp-outdir-prefix ${TMP_OUT_DIR} --workDir ${WORK_DIR} --realTimeLogging --cleanWorkDir=never --clean=never --outdir ${OUT_DIR} --logDebug --logFile ${LOG_FILE} --writeLogs --jobStore ${JOB_STORE} main_parell.cwl moun.yml