ExAC data with hg38 coordinates
has been around for filter-based annotation since late 2015, but it seems there is a systematic problem with the use of ExAC and 1000G lifted-over data data for annotation! Mainstream annotation softwares (Annovar, VEP and snpEff) fail to incorporate MAF for variants that their corresponding contig is reversed in the hg38 assembly. As a result, common variants in ExAC and 1000G populations might be misinterpreted as novel variant solely because annotation softwares fail to report corresponding MAF.
I was wondering if anyone here has come across the same problem and if so, how they have tackled this problem?
Hi Pablo, Thanks for your comment! Yes you guessed it right. Deeper investigation of the matter led me to realisation that the problem rather lays in dbsnp liftedover data. I am now using a custom script to fill in missing frequencies.