Entering edit mode
5.5 years ago
amandinelecerfdefer
▴
20
Hello,
Thanks to a file containing a list of rsIDs, I want to retrieve the name of the gene and transcripts corresponding to each rsID. tool :
install.packages('BiocManager', repos='http://cran.us.r-project.org')
BiocManager::install(c("biomaRt"))
library(biomaRt)
Data <- read.delim("/Users/amandinelecerfdefer/Desktop/Modification_vcf/cut/rsID_origine.txt2.txt")
snpmart <-
useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
T1<-Sys.time()
T1
res <- getBM(
attributes = c(
"refsnp_id",
"ensembl_gene_stable_id",
"ensembl_transcript_stable_id"
),
filters = "snp_filter",
values = Data$rsID,
mart = snpmart,
uniqueRows = TRUE
)
T2<-Sys.time()
T2
write.csv(res, file = "/Users/amandinelecerfdefer/Desktop/Modification_vcf/name_cut/recovery_gene_trans_original2.txt")
Tdiff= difftime(T2, T1)
Tdiff
write.csv(Tdiff, file = "/Users/amandinelecerfdefer/Desktop/Modification_vcf/time/time2.txt")`enter code here`
Last week this tool worked very well but for a few days now, it has been impossible to launch it due to a recurring error.
I have this error :
> res <- getBM(
+ attributes = c(
+ "refsnp_id",
+ "ensembl_gene_stable_id",
+ "ensembl_transcript_stable_id"
+ ),
+ filters = "snp_filter",
+ values = Data$rsID,
+ mart = snpmart,
+ uniqueRows = TRUE
+ )
Batch submitting query [=======>-----------------------------------------------------] 13% eta: 2hError in getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id", "ensembl_transcript_stable_id"), :
The query to the BioMart webservice returned an invalid result: biomaRt expected a character string of length 1.
Please report this on the support site at http://support.bioconductor.org
How to fix this error and make the tool work?
thank you
Hi, Basically, my file is 17 million lines in size. Having had this error, I thought I would cut this file into sub-files that will have a size of 100,000 lines. Example of a part of a file:
You can't use BioMart with a file 17 million lines long. You could use our APIs or you could parse the data out of the VCF files with consequences.
I suspect I can't do that with a 17 million line file but I tried it with 100,000 a few days ago and it was working but not anymore
You can't use it for 100,000 either. We recommend a maximum of 500.
It's strange, but I did it once with 100,000 lines. Thank you for your answer, so I will divide my 17 million line file into 500 line files to find the matches. Thank you. Thank you.
Please don't do that either. You will jam up our servers. I recommend parsing the VCFs.
No problem, I will find an other solution.
From the previous response of Mike Smith in vector dimension limit in biomaRt, it seems that there's already an internal function to do the batch work?
I modified my request with the information given in the post. But a new memory error appears:
Biomart version : 2.40.0
Seriously, please don't.
I only made a request for 20,000 rsID because, as Mike says, he expanded the research capacity. I have only requested a single file of 20,000 lines without making any loops, I test Mike's update.
I told you another way.
Parse the VCF
Use the APIs
Use the VEP
Don't blame me when your IP address gets blocked for clogging up our servers.
Thank you for your suggestions, I will explore these tools to find a more suitable one and thus avoid overloading the server.