I am looking to extract all of the reads that overlap a particular SNP location. For example, if I have a SNP at 387840, I want any read containing that position to be output to a new file.
I have tried:
samtools view -o filteredresults.sam alignment.sorted.bam chr2:387840
Which outputs filteredresults.sam with 1636 lines. Yet according to the output of:
samtools mpileup -f reference.fa -d 1000 alignment.sorted.bam > alignment.mp
This location should only have 16 reads overlapping.
Is samtools wrong for this purpose? Or are my filtering criteria off? Or am I misinterpreting the mpileup output?
Thanks in advance!
P.S. I know my questions are fairly basic now, but I really appreciate everyone's help!
EDIT: alignment.mp at position 387840 says:
chr2 387840 C 16 G$G$,..,,...,..,^~, %!#BIIC#III+HI#B
That did the trick! Thanks!
You're right, and a careful reading of the manual confirms that this is intended. I definitely wouldn't have guessed that behavior, though! I guess when the OP said the reads were actually around the target coordinates, they were wrong or very unlucky that it was near the end of the chromosome.
There still might be a slight difference in counts due to the filtering I describe in my answer, but this is definitely the main problem for being orders of magnitude off.
Thanks. It seems the filteredresults.sam output does not have header, how could I output a sam file with header?