Editing FASTQ headers
3
0
Entering edit mode
2.5 years ago

Hi! I have several fastq files (paired sequencing, si I have R1 and R2 files) with different headers. For example:

@SRR8834012.1.1 MG00HS20:989:CAKP3ANXX:1:1101:1962:1989 length=101
NGGTCCTCGGCAGGCCGAGACCGGCTTCTGCGATCAAGCTGCGCTGAACCTCGCTGCTCCCGGCGTAGATCGTCGCGGCGATGTTTCGGCGCGACATCCAC

Or

@MG00HS20:989:CAKP3ANXX:1:1101:1457:56177/1
TTCCGGATCCCGTCGCGCTGATCGCGGCCTTTTCGCGTCGAGATTGCACGAATGCCGCGTAGGTTTCGCGGTGACCGAGGCC
+
AAABCGC<<C1/@C99/EGEGGD=CGGGG/:FDEEGGGGDEGGGGGGGGGG/09FGG/C:CGG=FG0:FAGG.@DEG.?E.@

I would like to edit the headers on all my FASTQ files so that the header would be the name of the file with a number to the end that would act a sequence counter. So the headers for the reads in a file named test would be test_1, test_2, test_3, and so on.

Could someone help me?

fastq • 2.5k views
ADD COMMENT
0
Entering edit mode

You should look into bioawk (https://github.com/lh3/bioawk) - I think it will allow you to access these variables (file name, record number i.e. ordinal number of each read) in a streamlined way.

ADD REPLY
1
Entering edit mode
2.5 years ago

Here's a seqkit replace answer also that will loop over all of the fastq files in a directory.

find . -name "*.fastq" -exec sh -c \
  'seqkit replace -p .+ -r "$(basename $0 .fastq)_{nr}" $0 > ${0%.fastq}.renamed.fastq' {} \;
ADD COMMENT
0
Entering edit mode
2.5 years ago
iraun 6.2k

Hi! Please consider reformatting your question to be more readable.

Have you seen Quick One Liner For Fastq Header Renaming post? I think you can very much adapt the awk solution suggested there, and slightly modify it so that the name of your file is used. Something like this (not tested):

cat input_name.fastq | awk -v fqname="input_name"  '{print (NR%4 == 1) ? "@"fqname"_" ++i : $0}' |  > renamed_header.fastq
ADD COMMENT
1
Entering edit mode

I think cat and fqname are not necessary here. Following code should be fine:

$ awk '{print (NR%4 == 1) ? "@"FILENAME"_" ++i : $0}' test.fq
ADD REPLY
0
Entering edit mode

Yes of course the code can be shortened :).

ADD REPLY
0
Entering edit mode

I had not seen that post! Thanks for sharing, I'll try it.

ADD REPLY
0
Entering edit mode
2.5 years ago
$ bioawk -c fastx '{ print "@"FILENAME"_"++c, $seq,"+",$qual }' test.fq | tr -s "\t" "\n"
ADD COMMENT

Login before adding your answer.

Traffic: 2290 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6