Hi,
I would like to get SNPs associated with a gene from EBI over API request in Python. I found this tutorial:
Which is helpful to get a trait or SNPs, which is already good! However, I have to say that I am not really good with APIs and how to change this query to a gene query.
This is from the tutorial:
# API Address:
apiUrl = 'https://www.ebi.ac.uk/gwas/summary-statistics/api'
trait = "EFO_0001360"
p_upper = "0.000000001"
requestUrl = '%s/traits/%s/associations?p_upper=%s&size=10' %(apiUrl, trait, p_upper)
response = requests.get(requestUrl, headers={ "Content-Type" : "application/json"})
# The returned response is a "response" object, from which we have to extract and parse the information:
decoded = response.json()
extractedData = []
def getGene(studyLink):
# Accessing data for a single study:
response = requests.get(studyLink, headers={ "Content-Type" : "application/json"})
decoded = response.json()
gwasData = requests.get(decoded['_links']['gwas_catalog']['href'], headers={ "Content-Type" : "application/json"})
decodedGwasData = gwasData.json()
traitName = decodedGwasData['diseaseTrait']['trait']
pubmedId = decodedGwasData['publicationInfo']['pubmedId']
return(traitName, pubmedId)
extractedData = []
for association in decoded['_embedded']['associations'].values():
pvalue = association['p_value']
variant = association['variant_id']
studyID = association['study_accession']
studyLink = association['_links']['study']['href']
traitName, pubmedId = getStudy(studyLink)
extractedData.append({'variant' : variant,
'studyID': studyID,
'pvalue' : pvalue,
'traitName': traitName,
'pubmedID': pubmedId})
ssWithGWASTable = pd.DataFrame.from_dict(extractedData)
ssWithGWASTable
In the decoded
you get this:
'trait': [{'href': 'https://www.ebi.ac.uk/gwas/summary-statistics/api/traits/EFO_0001360'}],
which is I guess where to change it (maybe with /summary-staistics/gene/... ??). But I am not really good with APIs and hope to get some pointers or solutions here.
Thanks, Simon
Mihai! Thank you so much. I know I ask a lot here, but did you figure out the way in Python, and solving the trouble with the http request format Python had?
Right, so that ended up being just a few lines of code for the undocumented
https://www.ebi.ac.uk/gwas/api/search/advancefilter
API, but I think it's not a good idea to use it because the output JSON is huge and its schema isn't clear. It also happens to stream multiple top-level JSON objects, so you'll probably need an advanced parser to deal with the returned data. Basically, you get something like this:{...}{...}...{...}
, where each{...}
is an independent JSON document.Here's how to call it for the same example gene:
For the documented API I mentioned above, things are much easier, although the API seems a bit slow to respond, since it takes several seconds before you get the JSON data back. However, you do get a single JSON object in the response and, like I mentioned above, the pagination works too.
Hi Mihai,
I had a look at the output. It works but ... the output is something related to eQTLs. That is close, but its not the (disease) traits associated with a gene. I was naive enough to try this approach:
After looking at this documentation: https://www.ebi.ac.uk/gwas/rest/docs/api#_example_request_2
Output was kind of spaghetti, any pointers?
Hey Simon, sorry about that... It did seem like it was too easy to be correct :)
I looked into that API endpoint, but I think it's not returning what you want (see below)... Maybe this code I just found does the job? https://github.com/KatrionaGoldmann/omicAnnotations/blob/5d0a4dfda6ca55349408f2c6ee0792a02004696f/R/associated_publications.R I'm not skilled enough at R, but it looks like the heavy lifting is done by this package: https://cran.r-project.org/web/packages/easyPubMed/index.html
What I was able to do with that endpoint you provided is:
So, I have been in contact with EBI and those nice guys forwarded me this idea:
Another idea was this guy in bash (but made it not reproducible for some genes):
Can you please add that part to your answer above Mihai, so that I can (what I did already) accept your answer?
Hehe, of course they have some other API... This one is a bit nasty, because it doesn't let you filter on the gene name before download, but, luckily, the downloaded file is just 161MB.
I got this to work for example for gene
TP53BP1
:And with Python:
You could download the file once and then load it from disk instead of fetching it each time you run this script.