Entering edit mode
16 months ago
dzisis1986
▴
70
I am trying to use Biomart for a list of variants (with rs ids) to retrieve the consequence_types for each variant but as because my file is too big (80621 entries) i am getting an error. I was thinking to use biomart instead of VEP to save time in my query.
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [www.ensembl.org:443] Operation timed out after 300010 milliseconds with 73501 bytes received
This is my R script :
library(biomaRt)
Data <- read.delim("pha005198_data.csv", sep = ";", header = F)
Data$V1
# list all available databases
biomartr::getMarts()
head(biomartr::getDatasets(mart = "ENSEMBL_MART_SNP") , 15)
snpmart <-
useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
## list the available datasets in this Mart
listAttributes(mart = snpmart)
res <- getBM(
attributes = c(
"refsnp_id",
"reg_consequence_types"
),
filters = "snp_filter",
values = Data$V1,
mart = snpmart,
uniqueRows = TRUE
)
head(res)
#write table with the result
write.csv(res,file='rsID_biomart_out.csv', row.names=FALSE)
Do you know any way to run this in Python or to make it work for my input data?
Run your query in batches. It doesn't matter if you're using R or python, problems with API limits won't change because you use a different language to access the API.
How to run it in Batches ? Can you give an example? I would prefer to run it in Python so if you can give an example of Python code would be better ! Is any other alternative to get consequence information for my rsids in a quick way?
I'm not good at Python and also, I'm not going to write your code for you - I can guide you to the approach but not do your work for you.
Instead of passing all of
Data$V1
togetBM
, try using a loop that passes chunks of the vector, say 500 values at a time. Then collect the results and collate/merge them.i didn't say to work for me and do my code Don't worry and don't take it personally !!! Thanks for your advice but it's not for my case
You can download the VCF (could only get chromosomes you are interested in) including the consequences from http://ftp.ensembl.org/pub/current_variation/vcf/homo_sapiens/ and search locally. No API limits to worry about.
That looks promising thank you! I need all chromosomes because every time I have a GWAS file with variants (rsids) i want to add to my pipeline the step of consequence in order to know if each variant is intergenic intronic etc, so for each variant to extract an extra column with the consequences.