Problem With Dbsnp Sequence Variation Download Using Biomart
5
3
Entering edit mode
14.2 years ago

I need to download the the following attributes for all entries in dbSNP Build 131 via BioMart

Ensembl Transcript ID, Variation start in translation (aa) Variation end in translation (aa)

I can get this information for all dbSNP entries along with other attributes in "SEQUENCE VARIATION:" section of Biomart. But when I tried my query and it was giving proper results for 100 entries, but when I tried with "All" with email option, I never got an email from BioMart. I repeated the search again and I finally an email with an error message that

Your results file FAILED. Here is the reason why: Error during query execution: Lost connection to MySQL server during query

I would like to know if any one of you had similar problem with BioMart while trying for large queries ?
Is there any other way to download SEQUENCE VARIATION information for SNPs from dbSNPs ?

dbsnp biomart data • 4.4k views
ADD COMMENT
3
Entering edit mode
14.2 years ago

Yes, I've had this happen with large queries, presumably due to some resource limitations on the server. The solution suggested by Syed Haider was to chunk the queries on a suitable parameter. As you're requesting gene-relative attributes, chunks of 1000 gene IDs would be one approach. First I would run a query that retrieves just the gene IDs of interest, then modify a generated Biomart API Perl script to filter by chunk

$query->addFilter("ensembl_gene", ["id1","id2"]);

It's not very elegant, but it will work.

ADD COMMENT
0
Entering edit mode

Thanks Keith. I will split the queries and try it.

ADD REPLY
3
Entering edit mode
14.0 years ago
Fiona ▴ 70

hello,

you can now get GVF format dumps of genome wide variation data from the ensembl FTP site so you don't have to extract data from Biomart. You can access the data from the variation links on this page: http://www.ensembl.org/info/data/ftp/index.html

Or for a specific species, e.g. cow ftp://ftp.ensembl.org/pub/current/variation/bos_taurus/

Fiona

ADD COMMENT
2
Entering edit mode
14.2 years ago
Michael 55k

I have observed the same, the query works up to 1-2 million snps, but it's impossible to retrieve all in a single query.

Another option for you would be to go by chromosome and iterate, that should be feasible and also be a bit easier to implement also by using the web interface. If using the perl-client, you can try with the filter in a for loop for each chromosome (it worked for chr1 at least):

foreach my $chr (1..22,"X","Y") {
[... config query ...]
  $query->addFilter("chr_name", [$chr]);
[... retrieval code ...]
}

Btw.: It's a bit of a shame that is doesn't work out of the box, because could be an easy fix for BioMart. It's likely that the query takes too long, such that the connection is terminated. The mysql query is possibly still resuming on the server though, so it's not really a load limiting thing, and saves little to no resources, you only won't get the results. The MySQL-driver in perl has an option to auto re-connect, but it's possibly not active, whatever, that's BM's prob.

ADD COMMENT
0
Entering edit mode

Michael, Thanks for this. I hope the BioMart team will fix this soon.

ADD REPLY
0
Entering edit mode

Hi, unfortunately I don't think they will...

ADD REPLY
2
Entering edit mode
14.2 years ago
Mary 11k

Hey folks--

I was exchanging emails with Arek Kasprzyk (of BioMart) on some other stuff, so I also asked him this. And I asked if I could post the response over here and he said I could. Here's what he said:

"We have recently fixed a number of problems with sequence retrieval in biomart that were due to underlying problems with the backend ensembl database. It would be worth retrying this query. If you still see the problem there please post the bug report to the mart-dev mailing list"

Here's where you can try the mart-dev list: http://www.biomart.org/contact.html

I'm currently running this query again myself via Galaxy. Job is currently running....

ADD COMMENT
0
Entering edit mode

Thanks a lot for the followup Mary. I will retry my query and update you soon.

ADD REPLY
0
Entering edit mode

Hi Khader: I had the same error in Galaxy as I saw before. But you can try yours. I suspect it will have to go to the dev list. But I think it's a good idea to go there: some of the stuff that BioMart (or Galaxy, or Taverna, or any of those types of things) will touch is actually another party's issue. When you pull resources together like this support is going to get trickier. It's not always going to be straighforward to figure out where the issue resides.

ADD REPLY
1
Entering edit mode
14.2 years ago

if you try this via the web interface of BioMart!

I had a similar problem provisouly when downloading big gene lists with biomart and the solution was to submit the query as a offline/background job with email confirmation instead of trying to download live.

The problem comes when the query takes more time than the timeout of the server.

Hope this helps Stephane

ADD COMMENT
0
Entering edit mode

I think he tried this already?

ADD REPLY
0
Entering edit mode

Stephane: as Michael said, I have tried the email option and I have got this error. I contacted the BioMart help-desk, but received no response from them - hence I posted the query here.

ADD REPLY

Login before adding your answer.

Traffic: 1756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6