Trimming Fastq File Based On First Nucleotide
1
0
Entering edit mode
10.8 years ago
Floris Brenk ★ 1.0k

Hi all,

I have several fastq files and they are biased at the first sequenced base, which is a in too many reads a G. So what I would like is a simple script that removes the first base when it is a G in all reads from the fastq file. Is this possible?

Thanks

fastq trimming • 2.7k views
ADD COMMENT
2
Entering edit mode
10.8 years ago

See the "HEADCROP" option for trimmomatic as one (of likely very very many) methods.

Edit: Since you only wanted 1-base cropping if the first base is a G, then something like the following should work:

zcat something.fastq.gz | awk '{if(NR%4==2) { if(substr($1, 1, 1) == "G") { print substr($1,2); getline; print $1; getline; print substr($1,2)} else {print $1; getline; print $1; getline; print $1}} else { print $1}}' | gzip > something.trimmed.fastq.gz
ADD COMMENT
0
Entering edit mode

Thanks for your reply. But as far as I understand this removes all the first bases and I want that only the first G will be removed but when there is an A, C or T it can stay.

ADD REPLY
0
Entering edit mode

Ah, that was unclear to me. I've updated by answer with an example of how to do that with awk.

ADD REPLY
0
Entering edit mode

Yes perfect! thanks a lot!

ADD REPLY

Login before adding your answer.

Traffic: 2609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6