I made an R function to grab rsID genotypes from dbSNP using the entrez API. For example, here is the xml output that I am trying to parse:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=rs2656176&retmode=xml
the SPDI
tag contains the genotype info. so I can easily use xpathSApply to get this tag for many IDs. But for this dbSNP there are two different values, T>C and T>A.
The T>C change is the one that has more support in all pop freq databases, and this is easily grabbed from the dbSNP webpage:
https://www.ncbi.nlm.nih.gov/snp/rs2656176
The "Alt Allele" frequencies show A being nearly 0, while C is the major alt allele.
However, I don't understand how I can parse the data from the XML from above using this... the 'global maf' tag in the XML shows some MAF values, but the 1000G value is only for the reference allele, not the alt in this case. The ALFA tag works, but doesn't always work for different IDs. There are other dbs that show the alt "C" allele but that might not always be the case.
Does anyone have an idea of what tags I should look at? Or is there a better way to hit a different API to get this data instead?
Thanks!