Hi, I am having some issues with the VCF files generated from GATK caller, as they are not returning a mapping quality value for many positions, specially invariant sites.
Since the BAM files have a mapping quality score on every read, I am assuming that there is a way to get that value for every position without needing to use GATK. What are some alternatives? In a case where multiple samples are being used, do these MQ should be simply be averages between samples at every position?
In case you wonder how I am using GATK, I post relevant code below:
java -jar gatk HaplotypeCaller -I file.bam -O file.g.vcf -R reference.fa -ploidy 1 -ERC BP_RESOLUTION
# The above is done for different input files
java -jar gatk CombineGVCFs -R reference.fa -O combined.g.vcf --variant file1.g.vcf --variant file2.g.vcf ...
java -jar gatk GenotypeGVCFs -R reference.fa -V combined.g.vcf -O variants.vcf -ploidy 1 -all-sites
For some reason, this results in many MQ values being absent from the final VCF file (as well as many QUAL values taking an Infinity value).
It makes sense if you need to consider all variant and invariant sites but you want to restrict the analyses to well aligned sites. We are working with mutation accumulation data, so as you can imagine most of the genome is invariant, but we still need to know which fraction has enough quality for considering it in the computation of mutation rates