Question

best practice for renaming file in CWL workflow

1

Entering edit mode

6.8 years ago

ionox0 ▴ 390

I've been using this method to rename files at the end of a step:

  processed_fastq_1:
    type: File
    outputBinding:
      glob: ${ return '**/' + inputs.fastq1.basename }
      outputEval: |
        ${
          self[0].basename = inputs.add_rg_SM + '_R1.fastq.gz';
          return self[0]
        }

But I'm not sure if this is an acceptable method. Should there be an intermediate step using python or calling to a script to rename files in the middle of a CWL workflow?

cwl • 4.4k views

ADD COMMENT • link updated 6.8 years ago by Michael R. Crusoe ★ 1.9k • written 6.8 years ago by ionox0 ▴ 390

0

Entering edit mode

I think either method would be acceptable and could come with their own benefits/pitfalls. For example, one benefit of using an intermediate renaming step is that it would be compatible with a greater number of execution engines such as Cromwell, cwlexec, snakemake etc.. By creating an intermediate step you might eliminate some of the issues that the engine would impose. I am speaking only from experience with CWLExec, but I know that command would not be interpreted by cwlexec for Job Submissions to a cluster.

The obvious benefit of this is if you have an engine from which you know it works, you save an unnecessary intermediate step.

ADD REPLY • link 6.8 years ago by drkennetz ▴ 560

0

Entering edit mode

cwlexec does support InlineJavascriptRequirements so there is no compatibility issue with the asker's proposal.

If you've experienced a problem with cwlexec then that should be reported to https://github.com/IBMSpectrumComputing/cwlexec/issues?q=is%3Aopen+is%3Aissue and to your IBM support contact.

ADD REPLY • link 6.8 years ago by Michael R. Crusoe ★ 1.9k

1

Entering edit mode

I have been working with IBM quite a bit on bug fixes in regards to what they support and don't support. I work with CWLexec every day in our lab to develop pipeline automation. In theory, they support InlineJavascriptRequirement and ShellCommandRequirement but in practice CWLexec is quite picky. If you do not use cwlexec this convo doesn't really matter :).

ADD REPLY • link 6.8 years ago by drkennetz ▴ 560

0

Entering edit mode

Is there any way to do this after the fact in an ExpressionTool?

ADD REPLY • link 6.1 years ago by alanh ▴ 170

score 2 · Accepted Answer · 2018-10-22

2

Entering edit mode

6.8 years ago

Michael R. Crusoe ★ 1.9k

Hello ionox0,

That is acceptable way to rename, yes. However, ** is not part of the POSIX glob definition, so you'll need to amend that. Perhaps the following simplification works with your tool?

  processed_fastq_1:
    type: File
    outputBinding:
      glob: */$(inputs.fastq1.basename)
      outputEval: |
        ${
          self[0].basename = inputs.add_rg_SM + '_R1.fastq.gz';
          return self[0]
        }

It has been requested that there be an easier way to rename output Files but there is no proposed new syntax yet. Perhaps you have a suggestion?

https://github.com/common-workflow-language/common-workflow-language/issues/668

The same link shows how to rename a File between steps.

ADD COMMENT • link 6.8 years ago by Michael R. Crusoe ★ 1.9k

1

Entering edit mode

Thanks very much @mr-c!

My suggestion for changing the output file name would be something like:

outputs:
  out_thing:
    glob: *.fastq
    newName: my_new_name.fastq

If I have time I could put more research into how to implement this feature.

ADD REPLY • link 6.7 years ago by ionox0 ▴ 390

0

Entering edit mode

Thanks! I've added your suggestion to https://github.com/common-workflow-language/common-workflow-language/issues/668#issuecomment-443176439

ADD REPLY • link 6.6 years ago by Michael R. Crusoe ★ 1.9k

1

Entering edit mode

Thank you! The following syntax works:

  processed_fastq_1:
    type: File
    outputBinding:
      glob: $('*/' + inputs.fastq1.basename)
      outputEval: |
        ${
          self[0].basename = inputs.add_rg_SM + '_R1.fastq.gz';
          return self[0]
        }

ADD REPLY • link 6.7 years ago by ionox0 ▴ 390

score 1 · Accepted Answer · 2018-10-21

1

Entering edit mode

6.8 years ago

bogdan.gavrilovic ▴ 250

I would suggest adding a simple mv command to the end of the command line, via arguments, e.g.

arguments:  
  - position: 1000  
    shellQuote: false  
    valueFrom: ' && mv $(inputs.fastq1.basename) $(inputs.add_rg_SM)_R1.fastq.gz'

then you could just glob $(inputs.add_rg_SM)_R1.fastq.gz

This way you don't create duplicate files, which would be the case when using a separate step. Also, other users using the tool can see that renaming is happening just by looking at the command-line.

ADD COMMENT • link 6.8 years ago by bogdan.gavrilovic ▴ 250

0

Entering edit mode

Yes, this looks like a good solution, and is perhaps better practice than modifying the basename of the output itself

I also wonder if a new cwl step entirely would be an even more robust / cleaner solution

ADD REPLY • link 6.7 years ago by ionox0 ▴ 390