Calculate the allele frequency of specific sub-population using VEP
1
0
Entering edit mode
5.6 years ago

I have more than 800 snps with rs ids, i want to calculate the allele frequency of these snps in British (GBR) population from 1000 Genomes Project. The thing is that i can calculate the allele frequencies of super population like European (EUR) using VEP script but unable to find the way to calculate the allele frequency of specific population like British (GBR) population.

vep allele frequency • 2.5k views
ADD COMMENT
2
Entering edit mode
5.6 years ago
Emily 24k

Unfortunately VEP doesn't give the sub-population frequencies, only the super-populations. To get the sub-populations, you will need to use the Ensembl REST API. I would recommend using the variation_post endpoint, although you will need to split your variant list into 200 variant chunks. You can then parse the GBR frequency out of the JSON. If you're not already familiar with using REST APIs and parsing JSON, we've got an online course.

ADD COMMENT
2
Entering edit mode

Thank you, got the answer

Installation of Ensembl Rest API on R:

Install latest Rtools in base R from: https://cran.r-project.org/bin/windows/Rtools/

install.packages("githubinstall")
library(githubinstall)
library(devtools)
devtools::install_github("timyates/EnsemblRest")
library(EnsemblRest)

R code for Ensembl Rest API:

library(httr)
library(jsonlite)
library(xml2)

server <- "https://rest.ensembl.org"
ext <- "/variation/human/rs56116432?pops=1"

r <- GET(paste(server, ext, sep = ""), content_type("application/json"))

stop_for_status(r)

# use this if you get a simple nested list back, otherwise inspect its structure
# head(data.frame(t(sapply(content(r),c))))
head(fromJSON(toJSON(content(r))))
ADD REPLY
0
Entering edit mode

Dear Emily, your post above helped us a lot! Two questions: 1) The reported MAF in the JSON is always based on 1000Genomes? 2) Is there some other database providing sub-population resolution in addition to 1000Genomes?

ADD REPLY
1
Entering edit mode

Overall MAF is always 1000 Genomes, but if you get population MAFs they will come from different sources and will be labelled with their source.

ADD REPLY

Login before adding your answer.

Traffic: 1412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6