Biomart doesnt work in R for big input data. How to run it in Python ?
0
0
Entering edit mode
16 months ago
dzisis1986 ▴ 70

I am trying to use Biomart for a list of variants (with rs ids) to retrieve the consequence_types for each variant but as because my file is too big (80621 entries) i am getting an error. I was thinking to use biomart instead of VEP to save time in my query.

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [www.ensembl.org:443] Operation timed out after 300010 milliseconds with 73501 bytes received

This is my R script :

library(biomaRt)
Data <- read.delim("pha005198_data.csv", sep = ";", header = F)
Data$V1
# list all available databases
biomartr::getMarts()
head(biomartr::getDatasets(mart = "ENSEMBL_MART_SNP") , 15)
snpmart <-
  useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
## list the available datasets in this Mart
listAttributes(mart = snpmart)

res <- getBM(
  attributes = c(
    "refsnp_id",
    "reg_consequence_types"

  ),
  filters = "snp_filter",
  values = Data$V1,
  mart = snpmart,
  uniqueRows = TRUE
)
head(res)
#write table with the result 
write.csv(res,file='rsID_biomart_out.csv', row.names=FALSE)

Do you know any way to run this in Python or to make it work for my input data?

biomart R Python • 1.3k views
ADD COMMENT
1
Entering edit mode

Run your query in batches. It doesn't matter if you're using R or python, problems with API limits won't change because you use a different language to access the API.

ADD REPLY
0
Entering edit mode

How to run it in Batches ? Can you give an example? I would prefer to run it in Python so if you can give an example of Python code would be better ! Is any other alternative to get consequence information for my rsids in a quick way?

ADD REPLY
0
Entering edit mode

I'm not good at Python and also, I'm not going to write your code for you - I can guide you to the approach but not do your work for you.

Instead of passing all of Data$V1 to getBM, try using a loop that passes chunks of the vector, say 500 values at a time. Then collect the results and collate/merge them.

ADD REPLY
0
Entering edit mode

i didn't say to work for me and do my code Don't worry and don't take it personally !!! Thanks for your advice but it's not for my case

ADD REPLY
0
Entering edit mode

You can download the VCF (could only get chromosomes you are interested in) including the consequences from http://ftp.ensembl.org/pub/current_variation/vcf/homo_sapiens/ and search locally. No API limits to worry about.

ADD REPLY
0
Entering edit mode

That looks promising thank you! I need all chromosomes because every time I have a GWAS file with variants (rsids) i want to add to my pipeline the step of consequence in order to know if each variant is intergenic intronic etc, so for each variant to extract an extra column with the consequences.

ADD REPLY

Login before adding your answer.

Traffic: 1548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6