Hi everybody, could anybody explain me why these nucleotide entries in Genbank database don´t have an assembly accession number?
http://www.ncbi.nlm.nih.gov/nuccore/NC_023274.1?from=4015&to=4638&strand=2 http://www.ncbi.nlm.nih.gov/nuccore/NC_001735.4?from=34478&to=35101&strand=2
and this has one?
http://www.ncbi.nlm.nih.gov/nuccore/NZ_JTZL01000048.1?from=10765&to=11388
I am trying to understand this because I have to assign the assembly accession numbers to a list of nucleotide accession numbers and some of them doesn´t have this data.
Thanks!
First two entries are from RefSeq and are validated/curated (note the NC* accession #). The third entry is from WGS dataset. It was automatically annotated by NCBI's prokaryotic annotation pipeline (as notes indicate).
Thank you for the answer! So, the 2 first entries are not part of the assembly data base, are they?. I have a problem here because if I want to know how many genomes of a specie there are in ncbi and I search in the assembly data base I am not taking account of the entries that are like these 2 entries. How can I get the exactly number of genomes, no matter the level of assembly or if it is curated or not, of a certain specie?
Are you referring to "WGS" section of genbank as "assembly database"? RefSeq sequences are just that, references that are stable/curated.
A list of all genomes in NCBI is in this file. A similar list is available for RefSeq genomes as well.