Hi,
Sorry I am a beginner in NGS field I used SGA preqc to assess the quality of my data.
This program estimates the PCR duplicate proportion to 20%.
1) First, I am not sure to understant what is PCR duplicate, what is the difference with the sequencing coverage ?
2) 20%: Is it a lot ?
3) If yes, how to remove them ? is it important prior to a de novo assembly ?
Thanks for your support.
1) Did you have any good experience with removing those duplicates ? (your N50 improved a lot )
2) Should I remove it on raw reads or trimmed reads ?