This is likely a local networking issue, but if you post the list of genes then it's likely that one of use can just post the compressed fasta file somewhere for you.
This is a known problem. With large queries, BioMart is likely to lose connection with you partway through the download, which means you end up with only a partial dataset. There are a couple of solutions. The easiest one is to download the data files from the Ensembl FTP site. If you need something specific, that is not what is on the FTP site, you can get BioMart to email you your results, rather than download them directly. This means that BioMart doesn't have to communicate with you during the query and only needs to work internally then send the results to you.
You can get the CDS sequence from the FTP site. This README summarises what is in the header. You'll need to get the description, gene name, protein ID and CDS length from elsewhere: BioMart with results sent to you via email is probably your best best.
ADD REPLY
• link
updated 2.6 years ago by
Ram
44k
•
written 9.8 years ago by
Emily
24k
Unfortunately, the bigger your queries the more chances you have to encounter undesirable effects. The best solution, as suggested by Emliy, is to keep all the data locally. I encountered the same problem several times over, that's why I have written a python package that downloads all the data for a reference genome made available by Ensembl, and automatically stores it neatly into a database. You can find it here.
And here's the code to do want you want. First the importation:
from pyGeno.importation.Genomes import *
importGenome("Mus_musculus.GRCm38.78.tar.gz")
And then :
from pyGeno.Genome import *
ref = Genome(name = "GRCm38.78")
for trans in ref.iterGet(Transcript) :
print trans.gene.id
print trans.id
print trans.cDNA
print trans.gene.biotype
print trans.gene.name
print trans.protein.id
print len(trans.cDNA)
You can find the datawrap (package) for mus_musculus here.
Are you behind a proxy server?
This is likely a local networking issue, but if you post the list of genes then it's likely that one of use can just post the compressed fasta file somewhere for you.
Yeah I am networking using proxy
I needed the following Attributes
which I singled off in biomart