Question

CWL toil-cwl-runner for two samples, could not find a error log and reason for failure

0

Entering edit mode

5.8 years ago

a.james ▴ 240

Dear All,

I am using CWL toil-cwl-runner for two samples on my CWL workflow. The workflow is running forever, without any clear error logs and error message in the log files.

This is how the main_parell.cwl where all the sample logistics and the parallelising are defined.

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow

requirements:
 - class: ScatterFeatureRequirement
 - class: SubworkflowFeatureRequirement
 - class: InlineJavascriptRequirement

inputs:
 reads1:
  type: File[]
 reads2:
  type: File[]
 sample_names:
  type: string[]
 genomeDir:
  type: Directory
 sjdbGTFfile:
  type: File
 annotation:
  type: File
 bam:
  type: File
 exp_out:
  type: string
 outputfile:
  type: File
 genome:
  type: File

outputs:
 bam_dir:
  type: Directory
  outputSource: collect/bam_dir
 count_dir:
  type: Directory
  outputSource: collect/count_dir
steps:
  pipeline_workflow:
     run: workflow.cwl
     scatter: [reads1, reads2, sample_name]
     scatterMethod: dotproduct
     in:
      reads1: reads1
      reads2: reads2
      sample_name: sample_names
      genomeDir: genomeDir
      sjdbGTFfile: sjdbGTFfile
      annotation: annotation
      bam: bam
      exp_out: exp_out
      genome: genome
      outputfile: outputfile
     out: [alignment_out, expression_out]
  collect:
    in:
      bam_files:
        source: [pipeline_workflow/alignment_out]
        linkMerge: merge_flattened
      count_files:
        source: [pipeline_workflow/expression_out]
        linkMerge: merge_flattened
    out: [bam_dir, count_dir]
    run:
      class: ExpressionTool
      id: "collect_step"
      inputs:
        bam_files: File[]
        count_files: File[]
      outputs:
        bam_dir: Directory
        count_dir: Directory
      expression: |
       ${
        return {
          "bam_dir": {
             "class": "Directory",
             "basename": "bams",
             "listing": inputs.bam_files
          },
          "count_dir": {
             "class": "Directory",
             "basename": "counts",
             "listing": [].concat.apply([], inputs.count_files)
          }
        };
        }

The pipeline workflow.cwl looks as follows,

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow

doc: "Workflow Components: alignment -> count expression -> lib-size"

requirements:
 - class: ScatterFeatureRequirement
 - class: SubworkflowFeatureRequirement
 - class: InlineJavascriptRequirement

inputs:
 reads1:
  type: File
 reads2:
  type: File
 genome:
  type: File
 genomeDir:
  type: Directory
 #outSAMattrRGline: string
 sjdbGTFfile:
  type: File
 annotation:
  type: File
 bam:
  type: File
 exp_out:
  type: string
 sample_name:
  type: string


outputs:
 alignment_out:
  type: File
  outputSource: star/star_bam
 expression_out:
  type: File
  outputSource: expressioncount/expression_out

steps:
  star:
    run: star.cwl
    in:
     genomeDir: genomeDir
     reads1: reads1
     reads2: reads2
     sjdbGTFfile: sjdbGTFfile
     outFileNamePrefix: sample_name
     runThreadN:
      default: 4
     outFilterMultimapScoreRange:
      default: 1
    out: [star_bam]
  expressioncount:
   run: count_expression.cwl
   in:
    annotation: annotation
    bam: star/star_bam
    exp_out: exp_out
   out: [expression_out]

And then, the yml file looks as following,

 reads1:  # array of type "File"
      - class: File
        path:  sample1_R1.fastq.gz
      - class: File
        path:  sample2_R1.fastq.gz
    reads2:  # array of type "File"
      - class: File
        path:  sample1_R2.fastq.gz
      - class: File
        path:  sample2_R2.fastq.gz
    sample_names:  # array of type "File"
      - sample1
      - sample2
    #outSAMattrRGline: ID::M45ZB
    genomeDir:
      class: Directory
      path: hg19_hs37d5.overhang100_STAR
    genome:
     class: File
     path:  /genome.fa
    sjdbGTFfile:
     class: File
     path:  gencode.v28lift37.annotation.gtf
    samples:
     - class: File
       path: sample1.counts
     - class: File
       path: sample2.counts

    annotation:
     class: File
     path: gencode.v19.annotation.hs37d5_chr.gtf
    bam:
     - class: File
       path: sample1.bam
     - class: File
       path: sample2.bam
    exp_out:
      - sample1.counts
      - sample2.counts

The issue is toil cwl runner is running with any proper error logs and failed messages. Any help is appreciated . I ran the toil-cal like following,

toil-cwl-runner --stats --clusterStats --retryCount=0 --batchSystem=lsf --disableCaching --tmpdir-prefix ${TMP_DIR} --tmp-outdir-prefix ${TMP_OUT_DIR} --workDir ${WORK_DIR} --realTimeLogging --cleanWorkDir=never --clean=never --outdir ${OUT_DIR} --logDebug --logFile ${LOG_FILE} --writeLogs --jobStore ${JOB_STORE} main_parell.cwl moun.yml

cwl • 1.3k views

ADD COMMENT • link 5.8 years ago by a.james ▴ 240