I have done GATK's UnifiedGenotyper and get some variations, One of these variations is really the one I'm interested in, that's a T>G mutation(T is the reference base, G is the query reads base), T:6890, while G gets 162.
I want to get the 162 reads with G on the mutation site, how can I done this, I have the vcf files and bam files which were generated by GATK. could any one kindly help me? thank you!
how will he only get the 162 reads with G on the mutation site using samtools view ?
That's why you're "almost there". I was going to suggest using
grep 'NNGN'
on the output of the reads, but that is not guaranteed to be specific, or match all possible instances of this site. Your solution is clearly appropriate if all the OP wants to do is visually inspect the reads, but I suspect that "get the 162 reads" means actually getting the reads for a further operation.Yes, with your help, I 'm almost there, I can write some scripts to check the special site's base to determine whether to report that read, but that would make me to spend lots of time, if there were some tools that could do this would be better.
You might use
samtools mpileup
to get base calls and quality scores for that site, but I'm not exactly sure what information you would like to "determine whether to report that read".