Batch rename *fastq.gz files using regular expression
3
1
Entering edit mode
6.5 years ago

I'm trying to get a regex to work with rename; I've tried the approach of similar answered questions here but couldn't get the results I wanted.

The files are named as such:

SR1_S90_L001_R1_001.fastq.gz 
SR1_S90_L001_R2_001.fastq.gz
Rinc_S96_L001_R1_001.fastq.gz 
Rinc_S96_L001_R2_001.fastq.gz

And I would like to retain only the information prior to the first underscore and the _R1_ or _R2_ tags, like this:

SR1_R1_.fastq.gz
SR1_R2_.fastq.gz
Rinc_R1_.fastq.gz 
Rinc_R2_.fastq.gz

Thanks in advance!

regex rename perl fastq • 5.1k views
ADD COMMENT
4
Entering edit mode
6.5 years ago

Try safe-batch-rename tool brename ( https://github.com/shenwei356/brename )

brename -p '^(\w+?)_.+_(R[12])_.+' -r '${1}_$2.fq.gz'    # updated

# original answer
# brename -p '^(\w+)_.+_(R[12])_.+' -r '${1}_$2.fq.gz'
# if you have ran this, you can run 'brename -u' to undo.
ADD COMMENT
1
Entering edit mode

Almost there!

  • The first group was including the second tag in the filename (eg. _S90_), hence the addition of the second " _.+ "
  • Changed the structure of the expression to include the underscore after the _R[12]

The command with the final changes:

brename -p '^(\w+)_.+_.+(_R[12]_).+' -r '${1}$2.fastq.gz' -d
  • Included the -d for the dry run tests ;)

Thanks a bunch and congratulations on your software, Wei Shen

ADD REPLY
1
Entering edit mode

thanks for pointing out, if you have ran with the old command, you can run 'brename -u' to undo.

ADD REPLY
0
Entering edit mode

Yeah! I saw the parameters that after running the script and was amazed to see that option (couldn't test since I already had deleted the folder XD )

Thanks also for the seqkit software, Shen Wei!

ADD REPLY
4
Entering edit mode
6.5 years ago
st.ph.n ★ 2.7k

Quick python solution.

#!/usr/bin/env python
import os, glob

for file in glob.glob("*.fastq.gz"):
    # test with print statement
    print file, '\t', file.split('_')[0] + '_' + file.split('_')[3] +  '_.fastq.gz'
    # uncomment to rename
    # os.rename(file, file.split('_')[0] + '_' + file.split('_')[3] +  '_.fastq.gz')

Save as rename_fastq.py; run as python rename_fastq.py in the directory containing fastq.gz files.

Not sure why you want to keep '_' after the R*

ADD COMMENT
0
Entering edit mode

Hello!

I want to keep the '_' after the R* just to keep my sanity while running other scripts (that check for the patter _R*_ )

I've got a syntax error while running your script:

    import os, glob for file in glob.glob("/*.fastq.gz"):
                      ^
SyntaxError: invalid syntax

I've tried to replace the double quotes for single ones, but to no avail.

ADD REPLY
1
Entering edit mode

the for statement should be on a new line from the import statement. Looks like it must not have copied/pasted correctly. I commented out the actually renaming part, so you could test first and review the lines that are printed.

ADD REPLY
0
Entering edit mode

When running on:

python --version
Python 3.6.5 :: Anaconda, Inc.

I've got:

  File "rename_fastq.py", line 6
    print file, '\t', file.split('_')[0] + '_' + file.split('_')[3] + '_.fastq.gz'
             ^
SyntaxError: invalid syntax

But, using a Python 2.7.15 environment the script runs perfectly and as intended :D Thanks for you time!

ADD REPLY
1
Entering edit mode

yes, i'm still writing 2.7 syntax.

ADD REPLY
3
Entering edit mode
6.5 years ago

rename -n 's/(\w_).*_(R[0-9])_.*(.fastq.gz)/$1$2$3/' *.fastq.gz or rename -n 's/(\w+_)\w+_\w+_(\w._)\w+(.\w+)/$1$2$3/' *.fastq.gz

-n runs the command in dummy mode and it is distro specific. Check the available for options for rename on your distro. -n option is available on ubuntu 18.04 and remove -n for final conversion.

ADD COMMENT
0
Entering edit mode

Thanks!

It works as intended! Just modified to include the underscore after the _(R[0-9])_ part {and changed the range to [1-2]}

rename -n 's/(\w_).*_(R[1-2]_).*(.fastq.gz)/$1$2$3/' *.fastq.gz
ADD REPLY

Login before adding your answer.

Traffic: 1601 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6