Question

Resolving over clustered NGS with Q-scores

0

Entering edit mode

16 months ago

sam.himes92 • 0

I have just received data from an NGS run that I suspect was over clustered.

Read 1 is a 24 bp barcode of the following pattern

YSKRYSKRYSKRYSKRYSKRYSKR

Following the 24 bp barcode, the sequence should be the same for every read.

Read 2 is a 19 bp sequence that should be a known mutant of a WT promoter. So most of the read 2 sequences should be very similar to each other.

I suspect the run is over clustered because the quality scores for read 1 are poor for the first 24 bp, and then it gets much better. For read 2 the quality scores are much better through out.

If the run was over clustered, it would make sense that the first 24 bp of read 1 would have low quality because at each position there is only a 50 50 chance that the nucleotide is the same. Because the sequence is the same after 24 bp barcode it makes sense that the scores would suddenly improve because an overloaded cluster would suddenly be giving the same signal. Additionally, it makes sense that read 2 would have better scores through out because most of the mutations are single point mutations. So, in general, an over clustered cluster would only be in disagreement 1 or 2 times.

If possible, I would like to salvage this run. Is there a commonly recommended Phred score threshold that I can use to filter reads? If so, should that threshold be the same for both the barcode (R1) and the mutant promoter (R2)?

NGS overclustering phred Q-score • 1.2k views

ADD COMMENT • link 16 months ago by sam.himes92 • 0

0

Entering edit mode

Has phiX been spiked in to increase nucleotide diversity.

ADD REPLY • link 16 months ago by ATpoint 87k

0

Entering edit mode

Yes at 5%.

ADD REPLY • link 16 months ago by sam.himes92 • 0

score 0 · Answer 1 · 2023-12-06

0

Entering edit mode

16 months ago

Trivas ★ 1.9k

It might be because Illumina recommends 26 cycles for Read 1: https://knowledge.illumina.com/instrumentation/general/instrumentation-general-reference_material-list/000001891

ADD COMMENT • link 16 months ago by Trivas ★ 1.9k

0

Entering edit mode

I should have mentioned, we ran both Read 1 and 2 well past the points of interest. For each read we ran 75 cycles.

ADD REPLY • link 16 months ago by sam.himes92 • 0

0

Entering edit mode

Could you tell us which sequencer you used, how you quantified your library size and concentration, and what your loading concentration was? Honestly, jumping on a call with Illumina support for advice is always a good option - make them do some work for your business.

ADD REPLY • link 16 months ago by Trivas ★ 1.9k

0

Entering edit mode

The system that we used was Aviti. Our sequencing core was the one who quantified the loading/library concentration. Sorry I don't have those details. I am planning on talking with them. I wanted to know if the run was recoverable or not before reaching out to them.

ADD REPLY • link 16 months ago by sam.himes92 • 0