Removing reads from fastq file based on position
1
0
Entering edit mode
8.8 years ago
firestar ★ 1.6k

I have a fastq file. I know that the quality lines at say position 1142354 and 1145663 is of incorrect length or bad format etc. How do I remove those two lines along with associated sequences/quality lines (ie; if a line is removed, then other 3 related lines must be removed as well) as well and save to a new file?

I currently use this to get positions with incorrect read lengths.

awk '{if(NR%4==2) print NR"\t"$0"\t"length($0)}' input.fastq > input-readLength
awk '{if(NR%4==0) print NR"\t"$0"\t"length($0)}' input.fastq > input-qualityLength
awk 'NR==FNR{a[$3]++;next}!a[$3]' input-readLength input-qualityLength
RNA-Seq next-gen-sequencing • 1.9k views
ADD COMMENT
3
Entering edit mode
8.8 years ago
GenoMax 147k

Once you know the line positions use sed to delete

ADD COMMENT

Login before adding your answer.

Traffic: 2667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6