High levels of duplicated reads Illumina from PCR-free libraries

0

Entering edit mode

3.9 years ago

grey ▴ 40

Checking fastqc results from HiSeq 4000 run of PCR-free libraries and came across really high Sequence Duplication Levels.

Note: We were trying to sequence at high depth to detect somatic mutations (~240X).

Can't be PCR duplicates since these are PCR-free so it's a bit mysterious to me, though others might have ideas. Optical duplicates? Insufficient DNA in the sample? Should we be ok just removing duplicates or does this indicate something systemically wrong?

fastqc results

fastqc PCR duplicate reads Illumina • 1.9k views

ADD COMMENT • link updated 3.9 years ago by GenoMax 151k • written 3.9 years ago by grey ▴ 40

0

Entering edit mode

You can use clumpify.sh from BBMap suite to check on types/numbers of duplicates you have without doing alignments : Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.

While it is possible that there are some cluster/optical duplicates it may also simply be the characteristic of this library prep. I am not sure if your kit used tagmentation so there is a possibility that similar fragments were generated,

ADD REPLY • link 3.9 years ago by GenoMax 151k

Login before adding your answer.