Question

Duplication in DNA-seq vs RNA-seq

0

Entering edit mode

3.0 years ago

samc • 0

Why is duplication a greater concern in DNA-seq than RNA-seq?

Duplication • 1.4k views

ADD COMMENT • link updated 3.0 years ago by Carlo Yague 8.9k • written 3.0 years ago by samc • 0

score 0 · Answer 1 · 2021-11-25

0

Entering edit mode

3.0 years ago

Carlo Yague 8.9k

Because in RNA-seq, it is normal to have what we call biological duplicates. These are reads with exactly the same sequence whose origin takes root in the original RNA extract, not in a PCR/cluster artifact. It happens because in a cell, some RNA molecules are much more abundant than others. They are present in so many copies that the probability to sequence them multiple times is quite high. Such natural, biological duplication is more uncommon in whole genome sequencing experiments (DNA-seq), where each chromosome is present in a single copy (or a few copies, depending on the ploïdy) in the cells. Read duplication in such settings usually represents PCR duplicate or optical duplicate, two kinds of technical artefacts that are of bigger concern than normal biological duplicates.

Note that extra-deep coverage of any sequencing experiment (DNA-seq included) will tend to generate more natural duplicates, simply because of signal saturation.

ADD COMMENT • link 3.0 years ago by Carlo Yague 8.9k

1

Entering edit mode

I'd say its not that PCR duplication is less of a concern in RNA-seq than DNA-seq, its just harder to identify it (that is identify duplication that is definately PCR rather than biological), there is little we can do about when it is present.

ADD REPLY • link 3.0 years ago by i.sudbery 20k

0

Entering edit mode

I agree that optical and PCR duplicates are always concerning regardless of the -seq method. But duplicates in general, less so. For instance, in FASTQC, the duplicate sequence metric always "fail" and raises a flag with RNA-seq. Despite the flag, one should not be concerned about that, because it is normal.

I interpreted the OP question as related to duplicate sequences in general, but you are right to clarify.

ADD REPLY • link 3.0 years ago by Carlo Yague 8.9k