Entering edit mode
8.2 years ago
wanderingstefan
▴
30
Hey,
I have a quite basic question but do not really find an answer online. When an NCBI sequence is denoted as 'complete genome', what does it actually contain? Assuming we have a bacterial sequence, will it contain only the chromosomal sequence? or does it contain chromosomal and plasmid sequences, and thus the complete DNA found in the cell?
While it says that it may not be true (any longer?). See the related thread by @wanderingstefan: Download complete bacterial genomes and associated plasmid sequences from NCBI
Can you name at least one example where the above does not apply?
Since @wanderingstefan had posted this and other thread I (wrongly) assumed that it was done after due diligence. On double checking it does look like the "genomic.fna.gz" file contains associated plasmid sequences.
His problem was going through entrez. I'm pretty sure nobody even at the NCBI knows comprehensively how entrez queries work. At least it's not documented fully anywhere.
I am a little confused here. Does the above answer also apply to complete genomes downloaded from the 'nucleotide' database at ncbi? My statement that plasmid sequences are not contained in the 'complete genome' files from the 'nucleotide' database was based on a blast search of some whole genomes against a blast database containing the sequences of all plasmids at the ncbi refseq and thereafter calculating sequence coverage for the plasmids. May I ask how you checked @genomax2?
edit: I have to add that I terminated the analysis after around 100 random genomes, as I was unable to identify plasmids in any of them. I will check this again.
It applies when you look your assemblies of interest from this large file (do not open in browser!) and then download the "*_genomic.fna.gz" file that can be found from within the ftp directory specified by column 20 of said file.
Hey, thanks for the clarification. Yes, for those files all plasmid sequences are in there, I also found it and downloaded the suitable assemblies.