Why is duplication a greater concern in DNA-seq than RNA-seq?
Why is duplication a greater concern in DNA-seq than RNA-seq?
Because in RNA-seq, it is normal to have what we call biological duplicates. These are reads with exactly the same sequence whose origin takes root in the original RNA extract, not in a PCR/cluster artifact. It happens because in a cell, some RNA molecules are much more abundant than others. They are present in so many copies that the probability to sequence them multiple times is quite high. Such natural, biological duplication is more uncommon in whole genome sequencing experiments (DNA-seq), where each chromosome is present in a single copy (or a few copies, depending on the ploïdy) in the cells. Read duplication in such settings usually represents PCR duplicate or optical duplicate, two kinds of technical artefacts that are of bigger concern than normal biological duplicates.
Note that extra-deep coverage of any sequencing experiment (DNA-seq included) will tend to generate more natural duplicates, simply because of signal saturation.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I'd say its not that PCR duplication is less of a concern in RNA-seq than DNA-seq, its just harder to identify it (that is identify duplication that is definately PCR rather than biological), there is little we can do about when it is present.
I agree that optical and PCR duplicates are always concerning regardless of the -seq method. But duplicates in general, less so. For instance, in FASTQC, the
duplicate sequence
metric always "fail" and raises a flag with RNA-seq. Despite the flag, one should not be concerned about that, because it is normal.I interpreted the OP question as related to duplicate sequences in general, but you are right to clarify.