All question mark quality scores on several studies
0
0
Entering edit mode
17 months ago
Jonathan ▴ 10

I've stumbled upon several shotgun studies where all sample bases are ? (so 63 in phred+33). The first time I thought they might have been tampered with, but having just downloaded samples from 7 studies and 5 of them end up like this makes me wonder.

I thought it might be a certain type of quality binning, but I can't seem to find any binning that goes that high. Here are some example studies where this happened (I haven't checked all sequences, but 100% of the sequences I've checked in those studies are all ???).

SRP403740
SRP423365
SRP424298
SRP425931
SRP434700
SRP410115

I see no apparent pattern; different sequencer models (Hi seq and Novaseq), and all these studies seem independent.

Does anyone have an explanation for this?

quality-score shotgun illumina • 1.3k views
ADD COMMENT
0
Entering edit mode

Maybe they applied Q30 filter before uploading data?

ADD REPLY
0
Entering edit mode

Even so, is it even possible for the sequencer to have an error rate of less than 1 in a million for literally 100% percent of the bases?

ADD REPLY
0
Entering edit mode

I mean, maybe they drop those reads with < Q30 quality.

ADD REPLY
0
Entering edit mode

In this case that would have been a <Q63 filter; that doesn't really make sense, no? Is it really possible get 30M reads with quality >63 out of a MiSeq?

And again, it's 100% of bases with the exact same quality score, and this has happened over several independent studies.

ADD REPLY
0
Entering edit mode

I mean, fastq format adds 33 to present quality, filter tools will also add 33 to the cutoff before filtering.

ADD REPLY
0
Entering edit mode

Here is an example:

@SRR22424561.1/2
GTTCATCTTCTCGATTCGGCGGTGGATGTAGACGCAGTTGGCATCGAGTTTCCTGATGGTTTCGCCTTCAAGCGATACGCGGAAACGTATTACTCTGAGGGTGTATGTGTGTTGTGGTCCGATATTTACTATACATTCACGGCTATCAAA
+
??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
@SRR22424561.2/2
CGTTAAGCACATGCTGTGTGTACATGGCTTTTTTAATGATAACATAGTGTGGCGACAATGTTGTGAACCCCGAAAGTCCGGAAACGGACGGAGACGATATGATTCCTTGTTCGGTCAACCGGGTTGATGATGGCATCTATATGGAAAG
+
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
@SRR22424561.3/2
ACCGCCTGAGCGGTAGCGTAGAGTATTTCTCGCGGAAGACCTCGGACATGCTCTAGTACAAACCGGTATCTCCTTGGCTGGGTTACTCGCGGCAGCCTATCAACGTGGGTTAGATGCGCAATGCCGGTGTCGAGATTCAGGCGTCCGCC
ADD REPLY
0
Entering edit mode

I looked at the data from the example you posted and I see normal scores:

$ more SRR22424561_1.fastq 
@1
AATCCAAAGATAACCGCGAAACTCCGGAAATCCCTTCCGTCAGTGACATCTGGAAAGCGGCGGACTTCAACGAACGTGAAGTATTCGACTATTATGGAATTGTATTCTTCGGACATCGCGACATGAGGCGCCTTTATCTTCGTAACG
+1
-AAFF7A-AJJJJ-FJJJJ7-FJJFJFJ7FFJJJJJ7F<F7<JJ--FF-FFFFJ<<77<---J<AJJFJJJJJFJFAAAA-FAJJ<-7-<F<FAJJJ-AJAAAFJJJ<7-F<JJJFF-7--7-7AFF7F---77FA<7-AFAA7<7A 

How did you retrieve the data?

ADD REPLY

Login before adding your answer.

Traffic: 1646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6