The genomes in the NCBI ftp site (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/) is listed in alphabetic order with bioproject id at the end. And, there is no taxonomic information in the name. Is there a way to only download genomes that belongs to specific phyla? For example, how do i download all the genome folders that belong to Actinobacteria.
Wow, didn't realized this. My bad. Thanks. Is there a summary information for DRAFT too?. I scrolled through the folder but didn't see it.
for planctomycete_KSU_1_uid163683 , I found it in the gbk file "ftp://ftp.ncbi.nih.gov/genomes/Bacteria_DRAFT/planctomycete_KSU_1_uid163683/NZ_BAFH00000000.gbk " /db_xref="taxon:247490"
yes, thats for individual genome. A summary like the ones for complete genome ftp would have been better.
Right now i have a rsync set up between ftp and my database. But the list of genomes is only semi automatic right now. I can use the summary file in complete genome ftp to create a list with actinos and make the list totally automatic, but I am confused on how would i do it for the draft genome ftp. Do i have to read in all .gbk files for each organism in that folder?