Fastq files with very high per base sequencing quality score
1
0
Entering edit mode
8.4 years ago
Ivan S • 0

Hello,

I am currently working with Fastq files of exome sequencing with a coverage of 150x. After running FastQC tool on these files I observe quite high Quality Score values (~35 on average) with very narrow distribution across all positions. This seems a little suspicious to me. Since I have very little experience on this type of data I'd like to ask, Is it normal to observe such high Quality Score results??

Thank you for your help

sequencing Fastq Quality scores • 3.5k views
ADD COMMENT
1
Entering edit mode

That is normal. You can even see that in the example FastQC report: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_sequence_short_fastqc.html

ADD REPLY
0
Entering edit mode

Thanks a lot, I hadn't noticed this same tendency in the example report

ADD REPLY
1
Entering edit mode

You can analyze the quality scores empirically if you want, via mapping; BBMap has several options for that:

bbmap.sh ref=hg19.fa in=reads.fq.gz mhist=mhist.txt qahist=qahist.txt qhist=qhist.txt

mhist generates a histogram of matches and mismatches by base position; qhist gives claimed and measured quality per position; and qahist gives the quality-score accuracy (claimed versus observed). Sometimes the quality scores are quite accurate, sometimes not; it depends on a lot of factors including luck. But if you suspect they are wrong, it's nice to validate that.

Note that humans, being diploid with a roughly 1/1000 SNP rate, have a noise floor of around 30dB for these testing methods - they work better on haploids. But they will still be fairly accurate up to Q30.

ADD REPLY
0
Entering edit mode

That is not surprising, if the libraries are of good quality (and the read length is not > 150).

ADD REPLY
0
Entering edit mode

Suspicious data you say? ಠ_ರೃ

Can we see the clues too?

ADD REPLY
0
Entering edit mode

It depends on which sequencing technology you have used. If your data is from Illumina HighSeq, I would say the quality is as expected. But if your data is from Nanopore, I would also think it is suspicious.

ADD REPLY
1
Entering edit mode
8.4 years ago
Brice Sarver ★ 3.8k

Data I've analyzed from current sequencing platforms usually have excellent per-base quality scores. Though not always the case, I see larger 'dips' in quality scores at the beginning and end positions much less frequently than back in the earlier Illumina/454 days. You probably just have good data!

ADD COMMENT

Login before adding your answer.

Traffic: 2647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6