I have the BAM file for a trio of exomes that have gone through GATK's BQSR. The bams have also been used with many others to call variants using the GATK joint calling method. For a handful of variants, I want to obtain their individual mean MQ, and their raw allele counts.
One way would be to use the samtools mpileup software, but that will, at some point, be removed and is deprecated. I have read through the bcftools mpileup manual, and it seems that I can get site-specific (INFO/MQ) but not by sample (FORMAT/MQ), so it would be, what I assume, is the mean from all the BAMs I give it as input. Likewise, I can get the cumulative raw depth (INFO/DP) but not the raw depth (FORMAT/DP) or allele specific depth (FORMAT/AD) per sample
I have 2 questions, one method-specific and one for the bigger picture:
- Is there another tool or set of parameters I could use to obtain these sample-specific statistics?
- Is there a reason that I've missed why these tools have moved away from sample-specific MQ, raw AD, and raw DP?
I understand that in joint calling, the individual bam quality doesn't matter as much as the collective. Regardless, in cases when someone would want to take advantage of the already processed data and find potential de novo mutations, looking at the individual files and having this sample-specific data is very helpful.
Thanks in advance.
Tell me if I am missing the point of your question completely. If you have a BAM file, could you not simply use HaplotypeCaller to recall the variants of interest to retrieve the individual statistics you're after?
Regarding your second question, AD and DP of those statistics are still called per sample even in the joint VCF file. GATK still filters out reads it considers low quality, which don't go into these statistics. After you after all base calls at particular variants, irrespective of read quality?