I have a list of gene variants for which I need to get the exact protein sequence. For example:
ENST00000370378.4:c.3220A>G (GRCh37) which corresponds to p.S1074G
This is in gene KIAA1107. However, if I get the protein sequence using biomart for that gene, the mapping is not correct for the SNP, meaning that there is no S in position 1074. This probably has to do with how long ago this variant was identified, and I can figure out the correct sequence "by hand" by using an online resource such as the ClinGen Allele registry.
However, I need to do this programmatically and find out the sequence that corresponds to the SNP in question for a list of SNPs, do you have any suggestions?
Are you using the archival biomart for hg19/GRCh37? If you are simply using default then you are getting the GRCh38 version.