Entering edit mode
2.8 years ago
donny.dw
▴
30
From internet:
There are two main sources of duplicates: polymerase chain reaction (PCR) duplicates and natural duplicates. Unlike natural duplicates that represent true signals from sequencing of independent DNA templates, PCR duplicates are artifacts originating from sequencing of identical copies amplified from the same DNA template.
A DNA or RNA fragment will be amplifed for around 2^10 times during library prep steps. Why the duplicate reads is bad for sequencing?
They don't provide a lot of new information. A PCR copying error occurring at an early cycle will be propagated into all the other copies. Thus, if one sees a short variant only in the same fragment copied by PCR to a large number of copies, they usually discard this as an artefact. There are smart ways to do PCR for analysis such as liquid biopsy and it includes unique barcoding of initial DNA pieces, but it is not a standard practice.
Some techniques (such as AmpliSeq) heavily rely on multiplex PCR and removal of PCR duplicates there is very tricky since almost all the reads come from the duplicated fragments.