Using GATK MarkDuplicates for targeted sequencing data
1
0
Entering edit mode
23 months ago
asalimih ▴ 60

Hello,
Following the gatk best practices (link), I'm using MarkDuplicates tool in my somatic variant calling pipeline. when I run the pipeline on a targeted sequenced sample (600 genes by illumina) the duplication rate is really high: high_duplication_rate In this case is it safe to use MarkDuplicates or I'm loosing a lot of informative reads which effect the certainty of the called somatic variants.

genomic gatk variant • 775 views
ADD COMMENT
3
Entering edit mode
23 months ago
ATpoint 85k

It seems a general consensus that you do not remove duplivates in targeted assays like this since duplication is expected by design and removing them throws away excessively much information. After all, if this is one or few amplicons per gene then you basically have only one or few unique fragment per gene or at least per haplotype so there is not really the chance to meaningfully deduplicate.

ADD COMMENT

Login before adding your answer.

Traffic: 2646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6