Hi
I got the following MultiQC report for sequence duplication levels. As you can see, there are about over 20% of reads that have been duplicated over 1k times, some are even 10k times, I think there might be some poor quality issues for my data. Am I correct?
Thanks in advance
Not necessarily. You should analyze this data to see if you have a problem with PCR duplicates. Otherwise you may simply have some genes expressed at very high levels. Also see: https://sequencing.qcfail.com/articles/libraries-can-contain-technical-duplication/
How do I see whether I have a problem with PCR duplicates? Using IGV?
don't remove duplicates in RNA-Seq data,
duplicate removal is usually performed for variation calling but almost never on RNA-Seq
That is one way. Very first plot in the link I posted shows what a sample with PCR duplicates would look like in IGV.