Question

Biomart doesnt work in R for big input data. How to run it in Python ?

0

Entering edit mode

16 months ago

dzisis1986 ▴ 70

I am trying to use Biomart for a list of variants (with rs ids) to retrieve the consequence_types for each variant but as because my file is too big (80621 entries) i am getting an error. I was thinking to use biomart instead of VEP to save time in my query.

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [www.ensembl.org:443] Operation timed out after 300010 milliseconds with 73501 bytes received

This is my R script :

library(biomaRt)
Data <- read.delim("pha005198_data.csv", sep = ";", header = F)
Data$V1
# list all available databases
biomartr::getMarts()
head(biomartr::getDatasets(mart = "ENSEMBL_MART_SNP") , 15)
snpmart <-
  useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
## list the available datasets in this Mart
listAttributes(mart = snpmart)

res <- getBM(
  attributes = c(
    "refsnp_id",
    "reg_consequence_types"

  ),
  filters = "snp_filter",
  values = Data$V1,
  mart = snpmart,
  uniqueRows = TRUE
)
head(res)
#write table with the result 
write.csv(res,file='rsID_biomart_out.csv', row.names=FALSE)

Do you know any way to run this in Python or to make it work for my input data?

biomart R Python • 1.3k views

ADD COMMENT • link 16 months ago by dzisis1986 ▴ 70

1

Entering edit mode

Run your query in batches. It doesn't matter if you're using R or python, problems with API limits won't change because you use a different language to access the API.

ADD REPLY • link 16 months ago by Ram 44k

0

Entering edit mode

How to run it in Batches ? Can you give an example? I would prefer to run it in Python so if you can give an example of Python code would be better ! Is any other alternative to get consequence information for my rsids in a quick way?

ADD REPLY • link 16 months ago by dzisis1986 ▴ 70

0

Entering edit mode

I'm not good at Python and also, I'm not going to write your code for you - I can guide you to the approach but not do your work for you.

Instead of passing all of Data$V1 to getBM, try using a loop that passes chunks of the vector, say 500 values at a time. Then collect the results and collate/merge them.

ADD REPLY • link 16 months ago by Ram 44k

0

Entering edit mode

i didn't say to work for me and do my code Don't worry and don't take it personally !!! Thanks for your advice but it's not for my case

ADD REPLY • link 16 months ago by dzisis1986 ▴ 70

0

Entering edit mode

You can download the VCF (could only get chromosomes you are interested in) including the consequences from http://ftp.ensembl.org/pub/current_variation/vcf/homo_sapiens/ and search locally. No API limits to worry about.

ADD REPLY • link 16 months ago by GenoMax 147k

0

Entering edit mode

That looks promising thank you! I need all chromosomes because every time I have a GWAS file with variants (rsids) i want to add to my pipeline the step of consequence in order to know if each variant is intergenic intronic etc, so for each variant to extract an extra column with the consequences.

ADD REPLY • link 16 months ago by dzisis1986 ▴ 70