Hello everyone,
I have downloaded the human transcripts using BioMart (from Ensembl website). I was just wondering why I only get 1000 sequences; I would expect much more.
Research details:
Dataset: homo sapiens genes
Filters: limit to Ensembl human transcript IDs (only)
Attributes: Ensembl gene ID. Ensembl transcript ID, unspliced transcripts
I'd recommend very much the opposite: do use filters. The more the better actually. BioMart is not the tool to retrieve all the human genes (or transcripts) or their sequences or anything genomewide. Hence you've got the 'warning' max 500 advised to input your data (list of gene IDs for example). Also the only format to download the sequences is FASTA. There will not be a TSV.
The likely reason you only got 1000 sequences is the the web timed out due to the huge numbers of results it had to process. If you want to use the web interface I'd recommend you to do the query in chunks (per chromosome for example, selecting a filter under REGION). You can also try to get the results in a compressed format sent to you by email (when it's ready). That one should have all the sequences you are after and not 'just' the first 1000.
Do you know another tool for genome wide comparisons among few species? I had a similar problem with my compressed file from BioMart because the Microsoft Excel cannot open it correctly, so many genes do not appear in the results. I do not need the sequences, but I do need the orthologs.
Thank you.
Hello @gabrielabcg, the same would apply whether you download a table with orthologs or sequences (if using BioMart). A) You should restrict the query as much as you can by applying the filters. B) You could choose the results to be sent to you by email in a compressed format. In your case, you could also use this HTTPS call to access the Ensembl REST API. You can paste that URL into a browser URL location or use this command on a terminal window. If you rather continue with BioMart and still gets a corrupted file sent to you, it may be worth contacting the Ensembl helpdesk.
I would recommend not to filter anything.
Dataset: homo sapiens genes Attributes: Ensembl gene ID. Ensembl transcript ID.
first get the data in tsv, and then just remove the duplicates
Hello Denise,
Do you know another tool for genome wide comparisons among few species? I had a similar problem with my compressed file from BioMart because the Microsoft Excel cannot open it correctly, so many genes do not appear in the results. I do not need the sequences, but I do need the orthologs. Thank you.
Hello @gabrielabcg, the same would apply whether you download a table with orthologs or sequences (if using BioMart). A) You should restrict the query as much as you can by applying the filters. B) You could choose the results to be sent to you by email in a compressed format. In your case, you could also use this HTTPS call to access the Ensembl REST API. You can paste that URL into a browser URL location or use this command on a terminal window. If you rather continue with BioMart and still gets a corrupted file sent to you, it may be worth contacting the Ensembl helpdesk.