Optical Duplicates
0
0
Entering edit mode
3 hours ago
ebogen • 0

Hi, so I am working with 30x wgs data for 8 individuals. While running the data through the normal lab pipelines we came across a curiosity. Each sample on average had a 30% optical duplication rate as marked by gatks MarkDuplicates. After further investigation, a lot of these reads that are being marked as duplicates are piling up at the same locations across all individuals. For example, on chromosome 2:32916422 each individual has around 200 thousand optical duplicates and this is consistent at other locations across the genome. I am trying to understand what could possibly cause this or what I should look into next to better understand how to troubleshoot.

bioinformatics MarkDuplicates optialduplicates gatk • 47 views
ADD COMMENT
0
Entering edit mode

You may want to test an alternate method to verify what you are seeing is accurate. This is a big thread and you will want to read through completely: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.

I assume you aligned and then marked dups with GATK. clumpify.sh allows you to do this in an alignment free manner.

It seems odd that you would get "optical" duplicates from diverse samples at the same location.

ADD REPLY

Login before adding your answer.

Traffic: 1367 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6