I have CWL that runs a pair of tumor-normal samples for a given subject.
For later variant calling, I want to add the read names to be something like $(subjectName)_$(fastqs.sample)
The inputs are like this:
inputs:
fastqs:
type:
type: array
items:
type: record
fields:
- name: sample
type: string
- name: files
type:
type: array
items: File
referenceFasta:
type: File
subjectName:
type: string
steps:
dna_align_and_sort:
run align_sort.cwl
in:
reference_fasta: referenceFasta
fastq_files:
source: fastqs
valueFrom: $(self.files)
sample_name:
source: fastqs
valueFrom: ${MAGIC LINE HERE) # <---- WHAT GOES HERE?
out:
[fileInDir]
scatter: [fastq_files, sample_name]
scatterMethod: dotproduct
Can someone tell me what should go into the sample_name thing to make this work? \
I have successfully inserted the fastqs.sample as the name using $(self.sample)
so I know the underlying code works.
Can you elaborate on the kind of problem your example causes? Is it related to the scattering or the referencing of subjectName in the context of the step?
In a later step, the Mutect2 somatic variant caller seems to name the FORMAT data column in its output VCF using the value in the read. In my above example, if the "$(self.sample)" is either "tumor" or "normal" depending on the sample type. The reads get named "tumor" or "normal" based on that.
The problem occurs after that when I try to build a panel of normals (PON) from the normal samples, and if they're all named the same thing (Normal, Normal, Normal, etc), the PON creation step barfs because they're all the same names.
Can adding the
subjectName
solve this? Only a single subject name is given to the workflow. Wouldn't they still all have the same (albeit longer) name?Do the fastq files have unique names? If so, you could add their
nameroot
to the sample names to distinguish between them.I've tried a bunch of iterations here:
valueFrom: "$(subjectName)_$(self.sample)"
valueFrom: ${return inputs.subjectName.concat("_",fastqs.sample)}
and they all fail with various issues.
Try adding "subjectName" to inputs and then referring to it this way: