Gathered from various sources my lab has downloaded several E.coli assemblies from NCBI. Now, that I am looking to the data, I am trying to retrieve the metainfo. Even though for most of them I can get that from the assembly_summary file that can be downloaded from the NCBI FTP, there are some that do not exist even in the latest version of the file.
Example: GCA_018564605.1 does not exist in the assembly_summary file but if I look for it in the NCBI portal, it is there! https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_018564605.1/
Is it because the record is not curated? How can I retrieve these?
$ grep GCA_018564605 assembly_summary_genbank.txt
GCA_018564605.1 PRJNA514245 SAMEA7577678 DAEFSU000000000.1 na 562 562 Escherichia coli strain=110504014 110504014 latest Contig Major Full 2021/05/27 PDT001039592.1 National Center for Biotechnology Information na na https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/018/564/605/GCA_018564605.1_PDT001039592.1 from large multi-isolate project na na haploid bacteria 5111194 5111194 50.500000 0 66 66 NCBI NCBI Prokaryotic Genome Annotation Pipeline (PGAP) 2021/05/17 4979 4757 93 30286803
If you have the accession numbers can you not use something like
eutils
?