We're writing CWL's for a couple of different tools that want a FASTA reference, but refer to them in different ways. For example, STAR wants a --genomeDir as the input and the tool searches for the files it expects in that directory, but BWA and Picard want the actual fasta filename as input, and then looks for the associated indices derived from the filename.
I presume that I can use Javascript or some other glob method on the input directory to search for *.fasta *.fa and insert the filenames, but I don't know how one would do that.
Currently I can handle the BWA/Picard cases with something like this (Picard example):
- id: reference_fasta
type: File
secondaryFiles: [".amb",".ann",".bwt",".fai",".pac",".sa"]
inputBinding:
position: 1
prefix: R=
separate: false
And the STAR case:
- id: genomeDir
type: Directory
inputBinding:
position: 1
prefix: "--genomeDir"
I'd like to be able to use the genomeDir parameter in both cases and have it search for something like *.fasta or *.fa and dump that into the input lines.
E.g, the directory listing for a hypothetical "ref.fa" might have something like this:
ref.fa # BASE fasta
ref.fa.amb ref.fa.ann ref.fa.bwt ref.fa.fai ref.fa.pac ref.fa.sa # BWA/Samtools indices
SA SAindex Genome # STAR indices
ref.dict # Picard .dict
ref.genome # Bedtools Genome file
We would have a separate directory for each genome (e.g. mm10, hg38, etc), but we could just dump the directory name into the CWL whether the tool called for the fasta or the path to the directory.
- id: genomeDir
type: Directory
inputBinding:
position: 1
prefix: R=
separate: false
valueFrom: |
${
return # MAGIC SEARCH FOR *.fa(sta) happens here
}
(Pretty new to CWL here and haven't found an example workflow that includes this kind of expansion.)