Hi,
Can anyone please help with some known statistics for illumina mate pair libraries read duplication level. We have a lane of hiseq 8kb mate pair reads (200million genomic reads, 100bp). FastQC shows ~95% duplicated reads and CLC shows ~92%, which indicates extremely high level of duplication! When we pointed it to our sequence providers they said 80~90%duplication in common in 8kb mate pair library. Is it really the case? We understand read duplication can be high in Mate pair libraries, however, if it is 80-90% range, are those remaining only 10-20% unique reads any helpful for projects like denovo assembly(scaffolding, closing gaps etc)? Please shed some light if you've faced or seen issues like this.
Cheers.
Biostar is a Q&A and not a forum. I would suggest to create a new question rather than adding an answer that contains a new question.