Question

Subset reads based on Phred-score ranges

0

Entering edit mode

1 day ago

Mark ▴ 30

I have a read file

reads_all.fastq.gz

I'm wondering if there is an easy way to create subset files that contain only the reads in this original file that fall into a specific overall read Q-score

Example, I would like it to create files that look like this:

reads_all.fastq.gz
reads_ltQ10.fastq.gz
reads_Q11-20.fastq.gz
reads_Q21-30.fastq.gz
reads_Q31-40.fastq.gz
reads_gtQ40.fastq.gz

I'm wondering if anyone knows if there is a simple way to do this, ideally with a command-line program.

phred sequencing ngs • 159 views

ADD COMMENT • link updated 22 hours ago by GenoMax 149k • written 1 day ago by Mark ▴ 30

score 1 · Answer 1 · 2025-03-12

specific overall read Q-score

Do you mean average Q score across the reads?

I don't recall a tool (that does not mean one does not exist) that will do binning based on Q score intervals.

You could use bbduk.sh from BBMap suite with the following option

minavgquality=0     (maq) Reads with average quality (after trimming) below

You can then follow that up by using filterbyname.sh to do all intervals. e.g. get ltQ10.fq.gz file and then a ltQ20.fq.gz file. Filter the reads from Q10 file out of Q20 and that should leave reads between 10 and 20.