Entering edit mode
3.7 years ago
maxrwjones
▴
60
Hi all,
I've seen many Q/As on here and elsewhere that state that the main advantage of Picard Mark Duplicates over samtools rmdup is that the former removes interchromosomal duplicates while the latter does not.
My question is, if duplicate reads are determined by them having the same 5' start coordinate, how can reads mapping to different chromosomes (interchromosomal) ever be considered duplicates? They would have a different coordinate.
Cheers!
Thanks for your answer :)
This makes sense and I understand there can be duplicated regions of sequence. It was my understanding though that these tools did not detect sequence identity - they simply flag up reads if the 5' mapped coordinate is the same as that of another read. Even duplicated regions on different chromosomes would have different coordinates... they will be millions of bases apart in the concatenated genomic reference.
But maybe the tools are sorting by both coordinates and sequence?