best practice for renaming file in CWL workflow
2
1
Entering edit mode
6.1 years ago
ionox0 ▴ 390

I've been using this method to rename files at the end of a step:

  processed_fastq_1:
    type: File
    outputBinding:
      glob: ${ return '**/' + inputs.fastq1.basename }
      outputEval: |
        ${
          self[0].basename = inputs.add_rg_SM + '_R1.fastq.gz';
          return self[0]
        }

But I'm not sure if this is an acceptable method. Should there be an intermediate step using python or calling to a script to rename files in the middle of a CWL workflow?

cwl • 3.8k views
ADD COMMENT
0
Entering edit mode

I think either method would be acceptable and could come with their own benefits/pitfalls. For example, one benefit of using an intermediate renaming step is that it would be compatible with a greater number of execution engines such as Cromwell, cwlexec, snakemake etc.. By creating an intermediate step you might eliminate some of the issues that the engine would impose. I am speaking only from experience with CWLExec, but I know that command would not be interpreted by cwlexec for Job Submissions to a cluster.

The obvious benefit of this is if you have an engine from which you know it works, you save an unnecessary intermediate step.

ADD REPLY
0
Entering edit mode

cwlexec does support InlineJavascriptRequirements so there is no compatibility issue with the asker's proposal.

If you've experienced a problem with cwlexec then that should be reported to https://github.com/IBMSpectrumComputing/cwlexec/issues?q=is%3Aopen+is%3Aissue and to your IBM support contact.

ADD REPLY
1
Entering edit mode

I have been working with IBM quite a bit on bug fixes in regards to what they support and don't support. I work with CWLexec every day in our lab to develop pipeline automation. In theory, they support InlineJavascriptRequirement and ShellCommandRequirement but in practice CWLexec is quite picky. If you do not use cwlexec this convo doesn't really matter :).

ADD REPLY
0
Entering edit mode

Is there any way to do this after the fact in an ExpressionTool?

ADD REPLY
2
Entering edit mode
6.1 years ago

Hello ionox0,

That is acceptable way to rename, yes. However, ** is not part of the POSIX glob definition, so you'll need to amend that. Perhaps the following simplification works with your tool?

  processed_fastq_1:
    type: File
    outputBinding:
      glob: */$(inputs.fastq1.basename)
      outputEval: |
        ${
          self[0].basename = inputs.add_rg_SM + '_R1.fastq.gz';
          return self[0]
        }

It has been requested that there be an easier way to rename output Files but there is no proposed new syntax yet. Perhaps you have a suggestion?

https://github.com/common-workflow-language/common-workflow-language/issues/668

The same link shows how to rename a File between steps.

ADD COMMENT
1
Entering edit mode

Thanks very much @mr-c!

My suggestion for changing the output file name would be something like:

outputs:
  out_thing:
    glob: *.fastq
    newName: my_new_name.fastq

If I have time I could put more research into how to implement this feature.

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode

Thank you! The following syntax works:

  processed_fastq_1:
    type: File
    outputBinding:
      glob: $('*/' + inputs.fastq1.basename)
      outputEval: |
        ${
          self[0].basename = inputs.add_rg_SM + '_R1.fastq.gz';
          return self[0]
        }
ADD REPLY
1
Entering edit mode
6.1 years ago

I would suggest adding a simple mv command to the end of the command line, via arguments, e.g.

arguments:  
  - position: 1000  
    shellQuote: false  
    valueFrom: ' && mv $(inputs.fastq1.basename) $(inputs.add_rg_SM)_R1.fastq.gz'  

then you could just glob $(inputs.add_rg_SM)_R1.fastq.gz

This way you don't create duplicate files, which would be the case when using a separate step. Also, other users using the tool can see that renaming is happening just by looking at the command-line.

ADD COMMENT
0
Entering edit mode

Yes, this looks like a good solution, and is perhaps better practice than modifying the basename of the output itself

I also wonder if a new cwl step entirely would be an even more robust / cleaner solution

ADD REPLY

Login before adding your answer.

Traffic: 1849 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6