I am having a very difficult time translating my workflows to WDL. The two-stage workflow described below is a case in point.
Suppose that I have two programs, which I'll call FIRST and SECOND.
FIRST generates N files. (The input arguments for FIRST are not important.)
SECOND processes files like those generated by FIRST. (You may assume that SECOND takes, among its arguments, the path to a file like those generated by FIRST.)
I want to implement a workflow where FIRST generates a certain number (N) of files, and, subsequently, N independent runs of SECOND process these N files in parallel.
Can someone show me the WDL to implement a workflow with this general structure?
I should add that, if I were to implement this workflow using, say, a bash script + LSF, I would have the script first run FIRST, putting all the files it generates in one directory D (with nothing else in it), and then I would iterate over the files in this directory D, spawning (via LSF) a parallel run of SECOND for each file encountered.
Unfortunately, as far as I can tell, WDL provides no support for iterating over the contents of a directory. (I find this shocking. I consider iterating over directories as a workhorse operation in bioinformatics.)