Gatk Pipeline: Markduplicates At The End ?
1
1
Entering edit mode
12.1 years ago

The GATK Mark the Duplicates at the end of their pipeline, after merging the BAMs .

In order to remove the optical duplicates and for each lane, I would have put this operation after the alignment with BWA for each lane/sample (= parallelization = faster)

Is there any reason to mark the duplicates at this position in their pipeline ?

http://cdn.vanillaforums.com/gatk.vanillaforums.com/FileUpload/55/0a67f9e1b7962a14c422e993f34643.jpeg

gatk next-gen bam markduplicates pipeline workflow duplicates • 5.3k views
ADD COMMENT
3
Entering edit mode
12.1 years ago

As far as I understand, here they are sequencing the same LIBRARY in different lanes. I don't know what you mean by "optical duplicates", but what you want to get rid of are PCR duplicate, i.e. the same molecule (produced during PCR amplification) sequenced twice. Either two "spots" on the same lane or in different lanes. That's why you need to mark duplicate after you have merged all reads from a particular library. I guess you could mark duplicate also before, but definetely you need to do it after the merging.

I hope this helps

ADD COMMENT
0
Entering edit mode

optical duplicate=two spots, close to each other, mapping the same fragment.

ADD REPLY
1
Entering edit mode

Ok, then those should be very few. PCR duplicates can be many more and more serious. We did have some libraries with up to 70% PCR duplicates. Clearly no good libraries.

ADD REPLY
0
Entering edit mode

OK, the "PCR duplicates" is a good argument.

ADD REPLY

Login before adding your answer.

Traffic: 1664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6