Hi fellows,
It's said that MarkDuplicates in Picard matches all read pairs that have identical 5' coordinates and orientations and marks as duplicates all but the 'best' pair. If I have three pairs, with one of which is the 'best' pair, they're all truely from the target genome but not from sequencing artifacts, and if I set REMOVE_DUPLICATES=True, it will delete the two non-best pairs, then it will decrease the coverage for that region. This doesn't make sense, maybe I misunderstood the purpose of MarkDuplicates. So my question is, what's the purpose for MarkDuplicates, why does it delete the duplicates?
Thanks in advance
Lots of previous information in these threads: http://biostar.stackexchange.com/search?q=duplicates