We're writing a workflow where we process a couple of bams separately, then run processes with them together (DNAseq somatic). The workflows scatter the tumor/normal for parallel processing, ending up with "tumor.sorted.bam" and "normal.sorted.bam", then gather them for a realignments step in an array with an output of "tumor.realigned.bam" and "normal.realigned.bam". (BAM index files (*.bai) are created and passed along as secondaryFiles for each of these steps here and below as apppropriate)
These bams need some post-processing (resort, dealing with duplicate reads), and I'm able to scatter those bams and get "tumor.realigned.md.bam" and "normal.realigned.md.bam" back as what I believe to be an array of Files.
In the next step, I try to run and ExpressionTool to convert the array of Files into two File objects with secondaryFiles so I can refer to the tumor and normal bams explictly, but I think something is failing in my JavaScript ExpressionTool.
class: ExpressionTool
# collect_tumor_normal_bams.cwl
cwlVersion: v1.0
inputs:
bams:
type: File[]
secondaryFiles: [".bai"]
outputs:
tumorBam: File
normalBam: File
requirements:
InlineJavascriptRequirement: {}
expression: |
${
var tumor = [];
var tumorSecondary = [];
var normal = [];
var normalSecondary = [];
for (var filenum in inputs.bams) {
if (inputs.bams[filenum].basename.match(/tumor[^\/]*\.bam$/i) ) {
tumor.push (inputs.bams[filenum]);
}
if (inputs.bams[filenum].basename.match(/tumor[^\/]*\.bai$/i)) {
tumorSecondary.push (inputs.bams[filenum]);
}
if (inputs.bams[filenum].basename.match(/normal[^\/]*\.bam$/i)) {
normal.push (inputs.bams[filenum]);
}
if (inputs.bams[filenum].basename.match(/normal[^\/]*\.bai$/i)) {
normalSecondary.push (inputs.bams[filenum]);
}
}
tumor["secondaryFiles"] = tumorSecondary;
normal["secondaryFiles"] = normalSecondary;
return {"tumorBam": tumor, "normalBam": normal}
}
Here is a snippet from the higher-level CWL that is calling the above:
# lots of stuff above here that works
realign:
run: commandline/realign.cwl
in:
bam_files: stageForRealign/bamFiles
reference_fasta: referenceFasta
targets_bed: captureBed
out:
[bam_file]
post_realign_sort_index_md:
run: commandline/bamSortMarkDups.cwl
scatter: input_file
in:
input_file: realign/bam_file
out:
[bam_file]
# Everything works above here: I get the expected BAMs and BAIs
# nothing below here works: I suspect this is an issue with my ExpressionTool,
# but I don't know a good way to debug
collectTN:
run: expression/collect_tumor_normal_bams.cwl
in:
bams: [post_realign_sort_index_md/bam_file]
out:
[tumorBam, normalBam]
coverage_tumor:
run: commandline/coverage.cwl
in:
bam_file: collectTN/tumorBam
bed_file: coverageWindows
genome_file: bedtoolsGenome
out:
[counts_file]
coverage_normal:
run: commandline/coverage.cwl
in:
bam_file: collectTN/normalBam
bed_file: coverageWindows
genome_file: bedtoolsGenome
out:
[counts_file]
call_somatic_variants:
run: commandline/somatic-caller.cwl
in:
tumor_bam_file: collectTN/tumorBam
normal_bam_file: collectTN/normalBam
reference_fasta: referenceFasta
regions_bed: captureBed
out:
[ somatic_caller_output ]
If it makes a difference, we're using the Arvados CWL runner.