Why does base quality of reads generally decreases at the end of the read?
2
3
Entering edit mode
8.8 years ago

Why does base quality of reads generally decreases at the end of the read? I have learned that it can affect the alignment also but why at the first place does it happen?

I know it has something to do with the sequencing chemistry. Can somebody please explain?

base quality • 9.3k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
21
Entering edit mode
8.8 years ago
memory_donk ▴ 380

The really specific answer depends on what platform you're using but I'll go out on a (short) limb and guess its Illumina. If so, the drop-off is a phasing error.

With Illumina, DNA fragments are first bound to a flow cell. A well-prepared flow cell has even spacing between all DNA fragments. Before sequencing, the DNA fragments are amplified with a technique called bridge amplification, resulting in clusters of the same DNA molecule at each spot. Ideally no clusters overlap with each other (this is important for distinguishing clusters from each other). Illumina sequencers wash the flow cell with all 4 nucleotides and a blocker chemical so that only 1 base gets added to each molecule of DNA at a time. Different clusters may add different bases, but within a cluster it should always be the same.

This is how things work in a perfect world. In reality, a few molecules in each cluster will likely fail to add a nucleotide. So lets say we're on cycle 50 of 150. In this cycle 10 out of 1000 molecules fails to add a new nucleotide (an A). Next cycle (cycle 51) when the 990 other molecules add a G, the 10 that failed last cycle will add the A instead. From now on they will be at least 1 cycle behind the rest, polluting the light signal that the sequencer's camera has to read. in cycle 52, some more sequences fall behind from the main group, and some from the group that was already behind fall even further behind. You can see that by cycle 150, several percent of the molecules may well be out-of-sync with the cycle number and by the end of the sequencing run, the last N bases will have a less pure light signal than the first. This is the information Illumina sequencers use to calculate quality scores. New chemistries are largely intended to minimize this phasing problem, increasing the length of reads before quality begins to drop.

ADD COMMENT
1
Entering edit mode
8.8 years ago
piet ★ 1.9k

Because you are doing sequencing by synthesization. The more cycles you run, the more errors you accumulate.

ADD COMMENT
0
Entering edit mode

Can you please elaborate? Why it is specific to end of reads then?

ADD REPLY
0
Entering edit mode

One cycle = one base for a given read. The cycles toward the end of a read are therefore the bases near the end of a read. For example, cycle 147 of a 300-cycle MiSeq kit (2 x 150 bp reads) is the 147th bp of a 150 bp read (for read 1 that is; for the 147th base in read 2 it would be cycle 294).

If each cycle has a small, but measurable, error in incorporation of nucleotides, the ends of the read will have to most errors.

A similar situation exists with oligo synthesis. Each cycle has an error rate associated with incorporating new bases, therefore there is a growing number of oligos that are truncation products. Thus, long oligos have a lower percentage of full-length products.

ADD REPLY

Login before adding your answer.

Traffic: 2478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6