I am reading through the documentation of FastQC and when describing the "Sequence Duplication Levels" plots generated by fastqc, they state
... the red plot the sequences are de-duplicated and the proportions shown are the proportions of the deduplicated set which come from different duplication levels in the original data.
I understand that they are binning duplicate transcripts, but I don't understand what "deduplication" is. From my naive guess, it is removing the duplicates, but then the red line should be at 100% at x = 1, but that is clearly not the case.
Some explanation would be of great help. Thanks!