Hi All, I would like to download all the protein coding genes from Drosophila melanogaster either from Genbank or any other database. How to get all genes in a single file in FASTA or in .txt format.
Hi All, I would like to download all the protein coding genes from Drosophila melanogaster either from Genbank or any other database. How to get all genes in a single file in FASTA or in .txt format.
From Flybase, select one of the following links as appropriate for your task:
EnsemblGenomes ftp site: ftp://ftp.ensemblgenomes.org/pub/metazoa/current/fasta/drosophila_melanogaster/pep/Drosophila_melanogaster.BDGP5.pep.abinitio.fa.gz
I'm really liking the FlyMine interface: http://www.flymine.org/ And try the query builder to generate a custom query that suits your needs.
Edit: as per Casey's request, here's what I did with the query builder. Started with proteins, asked for melanogaster, and amino acids. You may want to add more, or constraints, or other things. But this is the basic structure. Gave me: Total rows:31904
If the real question is really for cDNA or for genomic DNA, it may take some tweaking from there.
@Mary, I'd like to +1 this solution, but try as I might (and I have many times) I find the FlyMine interface confusing and non-intuitive, except for the canned queries. Could you provide a more explicit answer of how to do a simple check-out operation like this using FlyMine, a la @Ido's response for BioMart below.
FlyMine has Templates available for this kind of common query so you shouldn't need to use the QueryBuilder.
There is one to fetch all proteins for an organism: http://www.flymine.org/query/template.do?name=Organism_Protein
And I just added a new one to fetch all the CDSs: http://www.flymine.org/query/template.do?name=Gene_CDSs
From the results menu, click on export then select FASTA to get a file with all the sequences.
Any database is a dangerous specification
Biomart:
UCSC:
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Not sure why this was down-voted. The question asks for "protein coding genes" but does not specify what level of gene structure is desired (e.g. peptides, transcripts, genes). Assuming that gene==protein (as is all to often done) is in general not a safe thing to do.
+1 from me, best
Thanks Casey Bergman