Hi everyone,
I am fairly new to bioinformatics and I am trying to wrap my head around genome sequencing and assembly. I need to analyze this genome of this bacteria called Shewanella benthica KT99 to predict genes encoded in its genome. However, the reported assembly level of the genome is in contig level. I understand that contigs are fragments of the genome for which the order of the bases is known to be correct. However I don't quite understand why the authors of the paper reporting the draft genome sequence could not assemble it to a genome assembly level of complete genome. I have attached the information I got from NCBI regarding the bacteria. On another note, there is a very closely related bacteria called Shewanella piezotolerans WP3 which has a genome assembly level of "complete genome" on NCBI. Both were sequences using ABI 3730 family DNA sequencers, and were separated by only a year or two. Why are the assembly levels different? Below are the details of my bacteria of interest.
So far I have used an established pipeline to work with complete genome sequences of bacteria. So is there a way to just take all these contigs and assemble them together to obtain the full genome this bacteria? If so, how do I do it, and what software packages (open source) do I use to do that?
Thank you in advance!
Anby
*ASM17207v1
Organism name: Shewanella benthica KT99 (g-proteobacteria)
Infraspecific name: Strain: KT99
BioSample: SAMN02436096Bio
Project: PRJNA13387Submitter: The Gordon and Betty Moore Foundation Marine Microbiology Initiative
Date: 2007/11/28
Assembly level: Contig
Genome representation: full
RefSeq category: representative genome
GenBank assembly accession: GCA_000172075.1 (latest)
RefSeq assembly accession: GCF_000172075.1 (latest)
RefSeq assembly and GenBank assembly identical: yes
WGS Project: ABIC01*
Thank you for the explanations, I appreciate it. I will try contacting the authors for the original sequencing reads are available. What would be a good program of choice when it comes to assembly?
On another note, can the sequencing coverage and / or quality be improved by repeating the sequencing process multiple times?
Anby
Maybe. But short reads can only do so much to resolve long repeats. Paired end data will help more than single end reads.