I have a scenario in which I'm working with hundreds of files at once, wanting to process them through two kinds of steps:
- Within-file preprocessing (one file in, one file out).
- Across-file analyses (e.g. clustering, ie many files in, one or a few files out).
My question(s) are:
a. Is there a way in CWL to do something analogous to Makefile wildcards? e.g. in a CWL workflow to specify to run a tool once each on all the files in a directory?
b. Is there something similar at the tool level? The solution we came up with for clustering was to pass in a csv file specifying each filename to read in (along with some metadata for the subsequent heatmap). The potential issue with this solution is that the CWL runner isn't aware of all the files the tool is actually working on.
I tried looking through the examples, and couldn't find a good one for either of these cases.
So it looks like what I'm asking for is also asked for in How process inputs based on a filename pattern using CWL
Michael has said that this is under development, but not yet implemented.