Is there an automatic way to get the fasta sequences of all sequenced (preferably completely) genomes within a taxonomic group?
And how can I get the taxid for all of these organisms as well?
Thank you.
Is there an automatic way to get the fasta sequences of all sequenced (preferably completely) genomes within a taxonomic group?
And how can I get the taxid for all of these organisms as well?
Thank you.
This is easily accomplished from NCBI's Assembly resource: https://www.ncbi.nlm.nih.gov/assembly/?term=bacteria%5Borgn%5D+latest_refseq%5Bfilter%5D+complete_genome%5Bfilter%5D You can download FASTA, annotation, or other files using the big blue "Download Assemblies" button.
Note "complete genome" is a useful filter for bacteria, but there are only a handful of eukaryote assemblies that are sequenced to completion (mostly fungi). If you're interested in eukaryotes you may want to either focus on assemblies at the "chromosome" level (to exclude WGS assemblies that are just bags of scaffolds), or use the "exclude partial" filter to exclude the small number of assemblies that are focused on a subset of the genome (e.g. just one chromosome).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
For Ensembl there is no dedicated API way that I know of. If you are specifically interested in bacteria from Ensembl genomes here is a hackish script you can adapt.