I would like to use a gene name (e.g. DOCK2) or its ENSEMBL ID as input, and get as output the reported GWAS traits to the respective gene. I did give a look into the GWAS Catalog API webpage, but I didn't find a way to use gene as input. Also, I looked for R and Python libraries for doing that, and I couldn't find either (maybe I didn't search well).
I know it's possible to do that for associations, studies, SNPs, and efoTraits. For instance, I could just use https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs4918943 url, where it returns info for the SNP "rs4918943". I would like something similar for a gene ID, or a library that does it.
It would be great to input a gene and receives any output where I could retrieve the GWAS trais related to such gene.
Thanks in advance!
where we should replace {gene} with the gene symbol (e.g. TP53).
Example using R:
# Required library # could be readr::read_delim() as well
library(data.table)
# Set the function
genesymbol2gwas <- function(gene){
url <- paste0(
"https://www.ebi.ac.uk/gwas/api/search/downloads?q=ensemblMappedGenes:", gene,
"&pvalfilter=&orfilter=&betafilter=&datefilter=&genomicfilter=&genotypingfilter[]=&traitfilter[]=&dateaddedfilter=&facet=association&efo=true"
)
return(fread(url))
}
# Use the function for gene "TP53"
genesymbol2gwas("TP53")
Example using Python:
# Load library to read the result table as a dataframe
import pandas as pd
# Define the function
def genesymbol2gwas(gene):
url = "https://www.ebi.ac.uk/gwas/api/search/downloads?q=ensemblMappedGenes:{}&pvalfilter=&orfilter=&betafilter=&datefilter=&genomicfilter=&genotypingfilter[]=&traitfilter[]=&dateaddedfilter=&facet=association&efo=true"
return pd.read_csv(url.format(gene), sep='\t')
# Use the function for gene "TP53"
genesymbol2gwas("TP53")
Thanks very much @cmcouto.silva for sharing this helpful solution :)
I just wanted to add on after running the code, one may notice differences between the genes indicated in the 'query' and the 'MAPPED_GENES' column. This is because running the code retrieves data for a query gene the way one would by pressing the 'Download Associations' button on the GWAS catalog browser for a gene page. From the GWAS catalog FAQ page: "The “Download Associations” button downloads a spreadsheet (.tsv) of the full data for every association displayed on the current page."
If the user only wishes to retrieve associations for that particular gene (i.e. what one would get by pressing the 'Export data' button on GWAS catalog browser instead of the 'Download Associations' button), then the following post-filtering step can be applied (edited R example)
I have the same problem. Did you manage to solve it?
Hi! Yes, I did! I just posted the solution I came across.