I have a single column data frame in R with nearly 143 dbsnp listings (in the form of rs*) and I want to fill in more columns with data such as the ancestral allele, MAF etc. What is the quickest way to do that ?
I have a single column data frame in R with nearly 143 dbsnp listings (in the form of rs*) and I want to fill in more columns with data such as the ancestral allele, MAF etc. What is the quickest way to do that ?
I'd use reutils
. It's a wrapper in R for eutils. eutils can get you all the information you need, and you can pass a comma separated list of input IDs for a command to process and give you output in one chunk.
For example, to get data on rs869312219 and rs869312218, my eutils command would be: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=rs869312219,rs869312218
You can check out this tutorial to help you transform eutils REST URLs to reutils commands: https://github.com/gschofl/reutils
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I still can't find how to determine the ancestral allele although I can get all the rest of the information which is perfect. Try for example "efetch("422628", "snp", "docset")" for rs422628. I wish there was a way to index "T", which is the ancestral allele in the output file.
You'll have to drill down to the
ancestralAllele
attribute in theSequence
tag (xpathExchangeSet/Rs/Sequence/@ancestralAllele
). This will be available only for entries with an ancestral allele listed.Add a
retMode=xml
option to the query so you can traverse using xpath.EDIT: I did this using the convoluted