Question

What is Deduplication in the context of Sequence Duplication Levels?

0

Entering edit mode

9.6 years ago

irritable_phd_syndrome ▴ 130

I am reading through the documentation of FastQC and when describing the "Sequence Duplication Levels" plots generated by fastqc, they state

... the red plot the sequences are de-duplicated and the proportions shown are the proportions of the deduplicated set which come from different duplication levels in the original data.

I understand that they are binning duplicate transcripts, but I don't understand what "deduplication" is. From my naive guess, it is removing the duplicates, but then the red line should be at 100% at x = 1, but that is clearly not the case.

Some explanation would be of great help. Thanks!

RNA-Seq fastqc • 12k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 9.6 years ago by irritable_phd_syndrome ▴ 130

Ram · Answer 1 · 2015-10-13

1

Entering edit mode

9.6 years ago

Istvan Albert 102k

The concepts are a little more difficult to untangle

Revisiting the FastQC read duplication report

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 9.6 years ago by Istvan Albert 102k