Rename part of file name using a two column csv/tsv
2
0
Entering edit mode
2.5 years ago
Quentin • 0

My apologies if a similar question has been asked before. I am struggling to find a solution to my problem. We are receiving a number of sequencing files (fastq) from an MGI sequencer. The default naming scheme for each library that is sequenced is

V350012345_L01_49_1.fq.gz
V350012345_L01_49_2.fq.gz
V350012345_L01_95_1.fq.gz
V350012345_L01_95_2.fq.gz

Each library is denoted by the default index number of the system i.e 49 and 95 in the above example. I would like to rename these using a tsv to replace the index number with the actual sample number using a tsv file as an input. eg.

 V350012345_L01_sample1_1.fq.gz
 V350012345_L01_sample1_2.fq.gz
 V350012345_L01_sample2_1.fq.gz
 V350012345_L01_sample2_2.fq.gz

I do have a file with the index number and sample name and have tried to rename using the following bash command without any success.

tsv file:

49       sample1
95       sample2

command:

while read -r id name; do rename 's/"$id"/"name"/g' *.fq.gz; done < sample.tsv 

Any pointers in the right direction will be appreciated as I am currently learning bash

rename bash • 1.8k views
ADD COMMENT
3
Entering edit mode

The solution from Matthias Zepper will work, but I want to recommend to you (in line with his first sentence) that whenever you rename original files then either be sure to make a copy first in case something crashes or gets overwritten or, preferred, make a separate folder, symlink (ln -s) the original files into it and test with that the command first or simply use the symlinked files as input for your analysis. In any case, make sure that the "rawrawraw" files from the sequencing company are somewhere, save and untouched.

ADD REPLY
0
Entering edit mode

I don't have a command line solution for you, but my solution would involve python and using os.system to write various mv commands to rename your files.

ADD REPLY
0
Entering edit mode

Thanks for the reply. I am open to learning python as well. I am interested in your solution

ADD REPLY
6
Entering edit mode
2.5 years ago

I'll recommend my brename again, a practical cross-platform command-line tool for safely batch renaming files/directories via regular expression.

whenever you rename original files then either be sure to make a copy first in case something crashes or gets overwritten...

@ATpoint is right, be careful before starting renaming.

Brename helps to detect potential overwrite conflict for you. I also provide a -d/--dry-run flag. Besides, even if wrongly renamed a large number of files with no error, brename could undo the last operation with the flag -u/--undo.

$ brename -p '_(\d+)(_[12].fq.gz)' -r '_{kv}$2' -k sample.tsv  -d
[INFO] read key-value file: sample.tsv
[INFO] 2 pairs of key-value loaded
[INFO] main options:
[INFO]   ignore case: false
[INFO]   search pattern: _(\d+)(_[12].fq.gz)
[INFO]   include filters: .
[INFO]   search paths: ./
[INFO] 
[INFO] checking: [ ok ] 'V350012345_L01_49_1.fq.gz' -> 'V350012345_L01_sample1_1.fq.gz'
[INFO] checking: [ ok ] 'V350012345_L01_49_2.fq.gz' -> 'V350012345_L01_sample1_2.fq.gz'
[INFO] checking: [ ok ] 'V350012345_L01_95_1.fq.gz' -> 'V350012345_L01_sample2_1.fq.gz'
[INFO] checking: [ ok ] 'V350012345_L01_95_2.fq.gz' -> 'V350012345_L01_sample2_2.fq.gz'
[INFO] 4 path(s) to be renamed
ADD COMMENT
2
Entering edit mode
2.5 years ago

Personally, I would first write the command and then dry run it to corroborate that indeed no unintended glitches happen (That is running it with the -n flag of rename first).

rename -n "s/_49_/_sample1_/g;s/_95_/_sample2_/g" *.fq.gz

To automate the generation of the command, you could e.g. use awk:

awk 'BEGIN{ORS=""; print "rename -n \""}{print "s/_"$1"_/_"$2"_/g;"}END{print "\" *.fq.gz \n"}' sample.tsv 

Redirect this e.g. to a script.sh file and then execute as you see fit.

ADD COMMENT

Login before adding your answer.

Traffic: 1905 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6