How can I add Sample Identifier to paired fastq file names
3
0
Entering edit mode
5.2 years ago
Tawny ▴ 180

I have over 500 paired fastq files. They have been received from a source where the Sample Identifier (S1, S2, S3) is no longer in the file names. I need to add a Sample Identifier to my paired file names for processing using QIIME2.

Here are some example file names:

Tube211-16S_L001_R1_001.fastq
Tube211-16S_L001_R2_001.fastq
Tube212-16S_L001_R1_001.fastq
Tube212-16S_L001_R2_001.fastq
Tube213-16S_L001_R1_001.fastq
Tube213-16S_L001_R2_001.fastq

I would like to add sequential Sample Identifiers to these so that they would look like this when finished:

Tube211-16S_S1_L001_R1_001.fastq
Tube211-16S_S1_L001_R2_001.fastq
Tube212-16S_S2_L001_R1_001.fastq
Tube212-16S_S2_L001_R2_001.fastq
Tube213-16S_S3_L001_R1_001.fastq
Tube213-16S_S3_L001_R2_001.fastq

I have tried to get this working however what it does is just add S1 to all of the R1 file names:

for((k=1;k<=516;k++)); do for i in *.fastq; do mv "$i" "`echo $i | sed "s/_16S_L001_R1/-16S_S${k}_L001_R1/"`"; done; done

I need to add the same Sample Identifier to the R1 and R2 paired file names.

How can this be done?

Fastq • 1.5k views
ADD COMMENT
0
Entering edit mode
$ ls
Tube211-16S_L001_R1_001.fastq  Tube212-16S_L001_R1_001.fastq  Tube213-16S_L001_R1_001.fastq
Tube211-16S_L001_R2_001.fastq  Tube212-16S_L001_R2_001.fastq  Tube213-16S_L001_R2_001.fastq

.

$ ls *R2*.fastq | sort|  nl -nln | sed 's/^/S/;s/\s\+/_/' | rename -n 's/(.*)_(.*)-(.*)_(.*)/$2_$1_$3_$4/'

rename(S1_Tube211-16S_L001_R2_001.fastq, Tube211_S1_16S_L001_R2_001.fastq)
rename(S2_Tube212-16S_L001_R2_001.fastq, Tube212_S2_16S_L001_R2_001.fastq)
rename(S3_Tube213-16S_L001_R2_001.fastq, Tube213_S3_16S_L001_R2_001.fastq)
ADD REPLY
1
Entering edit mode
5.2 years ago
Tawny ▴ 180

I ended up making a slight change to colin.kern's answer. It needed arithmetic expansion to properly increment the variable k. Here is the command that ended up working for me:

k=1; for i in *.fastq; do mv "$i" "`echo $i | sed "s/_16S_L001_R1/-16S_S${k}_L001_R1/"`"; k=$((k+1)); done
ADD COMMENT
1
Entering edit mode
5.2 years ago
colin.kern ★ 1.1k

It doesn't work because the inner loop (going through all the fastq files) is done completely on the first iteration of the outer loop (when k=1). You can just increment k yourself in the loop:

k=1; for i in *.fastq; do mv "$i" "`echo $i | sed "s/_16S_L001_R1/-16S_S${k}_L001_R1/"`"; k=$k+1; done

Also, you can actually do search and replace on variables directly in bash:

k=1; for i in *.fastq; do mv "$i" "${i/_16S_L001_R1/-16S_S${k}_L001_R1}"; k=$k+1; done
ADD COMMENT
0
Entering edit mode

@colin.kern thank you for offering these two solutions. I did have to modify them slightly (see my answer below) by using arithmetic expansion.

ADD REPLY
0
Entering edit mode

Tawny : It is fair that you acknowledge their help by at least upvoting this answer.

ADD REPLY
1
Entering edit mode
5.2 years ago

try brename

# read1
$ brename -f 'R1.+fastq$' -p _L -r '_S{nr}_L' -d 
[INFO] checking: [ ok ] 'Tube211-16S_L001_R1_001.fastq' -> 'Tube211-16S_S1_L001_R1_001.fastq'
[INFO] checking: [ ok ] 'Tube212-16S_L001_R1_001.fastq' -> 'Tube212-16S_S2_L001_R1_001.fastq'
[INFO] checking: [ ok ] 'Tube213-16S_L001_R1_001.fastq' -> 'Tube213-16S_S3_L001_R1_001.fastq'
[INFO] 3 path(s) to be renamed

# read2
$ brename -f 'R2.+fastq$' -p _L -r '_S{nr}_L' -d 
ADD COMMENT

Login before adding your answer.

Traffic: 1090 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6