Question

Use filename found in an input Directory in a CommandLineTool

0

Entering edit mode

6.7 years ago

alanh ▴ 170

We're writing CWL's for a couple of different tools that want a FASTA reference, but refer to them in different ways. For example, STAR wants a --genomeDir as the input and the tool searches for the files it expects in that directory, but BWA and Picard want the actual fasta filename as input, and then looks for the associated indices derived from the filename.

I presume that I can use Javascript or some other glob method on the input directory to search for *.fasta *.fa and insert the filenames, but I don't know how one would do that.

Currently I can handle the BWA/Picard cases with something like this (Picard example):

  - id: reference_fasta
    type: File
    secondaryFiles:  [".amb",".ann",".bwt",".fai",".pac",".sa"]
    inputBinding: 
      position: 1
      prefix: R=
      separate: false

And the STAR case:

- id: genomeDir
    type: Directory
    inputBinding:
      position: 1
      prefix: "--genomeDir"

I'd like to be able to use the genomeDir parameter in both cases and have it search for something like *.fasta or *.fa and dump that into the input lines.

E.g, the directory listing for a hypothetical "ref.fa" might have something like this:

 ref.fa # BASE fasta
 ref.fa.amb ref.fa.ann ref.fa.bwt ref.fa.fai ref.fa.pac ref.fa.sa # BWA/Samtools indices
 SA SAindex Genome # STAR indices
 ref.dict  # Picard .dict
 ref.genome # Bedtools Genome file

We would have a separate directory for each genome (e.g. mm10, hg38, etc), but we could just dump the directory name into the CWL whether the tool called for the fasta or the path to the directory.

- id: genomeDir
    type: Directory
    inputBinding: 
      position: 1
      prefix: R=
      separate: false 
      valueFrom: |
            ${
                 return # MAGIC SEARCH FOR *.fa(sta) happens here
              }

(Pretty new to CWL here and haven't found an example workflow that includes this kind of expansion.)

cwl • 2.0k views

ADD COMMENT • link updated 6.7 years ago by bogdan.gavrilovic ▴ 250 • written 6.7 years ago by alanh ▴ 170

score 3 · Accepted Answer · 2018-04-13

Hi, The Directory input has a listing property which you can get in JavaScript expression with inputs.genomeDir.listing. This returns a list of all file and directory objects contained inside that directory. This objects have all the same properties as regular file objects (path, basename...)

So for example in your case, you can get the .fa file path with something like this

${
    file_list = inputs.genomeDir.listing  
    for(i in file_list){
        if (file_list[i].path.endsWith('.fa'))
        return file_list[i].path
    } 
}