Could you please tell me how to download all the cDNA sequences of the entries in trEMBL and nr databases?
Could you please tell me how to download all the cDNA sequences of the entries in trEMBL and nr databases?
Hi,
You could use Biomart and choose the database Ensembl Genes and go to the particular species. In the Filters
section on the left side, go to Gene
and select Limit to genes... With UniProtKB/TrEMBL Accession(s)
Select the attributes you want to download which has the option for cDNA sequence
in Sequences
radio button.
This is the easy and fast way. You could use Ensembl Perl API too if you would like to customize and batch download for multiple species.
PS: This is a targeted search of Ensembl database and may not be totally up to date with the most recent updated records at UniProtKB/trEMBL.
For UniProtKB (UniProtKB/SwissProt + UniProtKB/TrEMBL) the set of source coding sequences is equivalent to all the CDS features in EMBL-Bank.
The European Nucleotide Archive (ENA) provide a set of data files for ENA Coding sequences (formerly known as EMBLCDS) which is available from the EMBL-EBI FTP site:
For what it is worth, ENA also provide an equivalent dataset for non-coding RNA features appearing in EMBL-Bank entries:
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.