Hello Biostars,
It is my understanding that samtools marks duplicates on the basis of the 5' position of reads and also the orientation of reads. This is based on my reading of the following:
http://www.htslib.org/algorithms/duplicate.html
However, I am not sure what exactly is meant by 'orientation' in this context.
In my mind, this can be interpreted in two ways:
1) Whether paired end reads are facing inwards, outwards or in the same direction
2) Everything stated above but also whether the set of first reads from a set of paired end reads map to the same strand or not (i.e. F1R2 and F2R1 nomenclature - cf Orientation of PE reads a review of --fr --ff and --rf meanings)
If anybody has any idea about this, that would be great
Thanks