Entering edit mode
3.4 years ago
grey
▴
40
Checking fastqc results from HiSeq 4000 run of PCR-free libraries and came across really high Sequence Duplication Levels.
Note: We were trying to sequence at high depth to detect somatic mutations (~240X).
Can't be PCR duplicates since these are PCR-free so it's a bit mysterious to me, though others might have ideas. Optical duplicates? Insufficient DNA in the sample? Should we be ok just removing duplicates or does this indicate something systemically wrong?
You can use
clumpify.sh
from BBMap suite to check on types/numbers of duplicates you have without doing alignments : Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.While it is possible that there are some cluster/optical duplicates it may also simply be the characteristic of this library prep. I am not sure if your kit used tagmentation so there is a possibility that similar fragments were generated,