I currently have variant data per sample and the corresponding RNA seq BAMs per sample. What I'm trying to do is tap into the corresponding bam files, determine the number of reads associated with the variant, and then deduce the %ref and %alt alleles in those reads.
So for instance, I have a variant of interest at position chr2:86042011
, where the ref allele is CCAGA and the alt allele is C, and let's say there are 10 reads that cover that variant position - I want to know how many of those reads have the ref allele and how many have the alt allele.
I'm currently trying to use samtools mpileup -I [bed file] [bam]
. So taking the above example again, I feed a BED file that looks like:
chr2 86042010 86042012
The output I then get from running that command is:
chr2 86042011 N 2 CC sJ
chr2 86042012 N 2 CC sJ
One of my confusions is how you know how many reads cover your variant. For instance, should my BED file intervals be greater? Am I doing this right?
You can use
samtools depth
to get the number of reads covering a position.Thanks! So going off of my example above, I did
samtools depth -r chr2:86042011-86042011 myBAM.bam
and got:So how does that read depth number then get factored in to the
samtools mpileup
command? Does that mean the interval has to be86042011- 86042014
or how do I know the corresponding position of the 3 reads covering the variant?Also, why is there a discrepancy between what
mpileup
(2) gives me versusdepth
(3) in the number of reads?See: Difference between samtools DEPTH and MPILEUP