Hi everyone,
I was wondering if you can share any tips to mitigate the following issue. I am interested in getting the count of reads carrying the reference and alternate reads at a number of genomic positions, which is a straightforward mpileup question. However, I have noticed that when I run the mpileup command from the command line I get a different answer compared to Rsamtools pileup function. After some drilldown, the discrepancy appears to be due to the following two parameters:
1) The command line samtools discards orphan reads (anomalous read pairs) by default. Rsamtools pileup does not seem to do so.
2) The command line samtools performs read pair overlap detection to avoid double counting read pairs. It looks like Rsamtools does not.
My question is whether there is a way to make Rsamtools behavior match the command line samtools.
Cheers!
This is probably not what you want but wouldn't it be the easiest to simply run normal
mpileup
, be it from inside R or command line and then load this into R?? It can be difficult to make sure two tools perform identical covering all edge cases so I wonder if it is worth the effort.Thank you for your input. I have considered this possibility as well. However, given that Rsamtools is supposedly "an interface to the samtools" , I wonder why the default behavior is different between the two and how we can make them match.