Subset reads based on Phred-score ranges
1
0
Entering edit mode
1 day ago
Mark ▴ 30

I have a read file

reads_all.fastq.gz

I'm wondering if there is an easy way to create subset files that contain only the reads in this original file that fall into a specific overall read Q-score

Example, I would like it to create files that look like this:

reads_all.fastq.gz
reads_ltQ10.fastq.gz
reads_Q11-20.fastq.gz
reads_Q21-30.fastq.gz
reads_Q31-40.fastq.gz
reads_gtQ40.fastq.gz

I'm wondering if anyone knows if there is a simple way to do this, ideally with a command-line program.

phred sequencing ngs • 159 views
ADD COMMENT
1
Entering edit mode
22 hours ago
GenoMax 149k

specific overall read Q-score

Do you mean average Q score across the reads?

I don't recall a tool (that does not mean one does not exist) that will do binning based on Q score intervals.

You could use bbduk.sh from BBMap suite with the following option

minavgquality=0     (maq) Reads with average quality (after trimming) below 

You can then follow that up by using filterbyname.sh to do all intervals. e.g. get ltQ10.fq.gz file and then a ltQ20.fq.gz file. Filter the reads from Q10 file out of Q20 and that should leave reads between 10 and 20.

ADD COMMENT

Login before adding your answer.

Traffic: 2622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6