We've sequenced two ~7-9Mbp microbial genomes using 454 which was subsequently assembled with newbler. For the first bacteria we have 8 sequence scaffolds. These scaffolds contain gap regions which I assumed were the result of when the sequencing coverage dropped off. However when I look at the read depth for these regions the contig appears to terminate prematurely while there is still a large amount of read depth. I assume that these reads could still continue off the end of the contig but they have been ignored. I've been reading the newbler documentation and it seems to indicate that contig extension stops when there are repeats in the genome.
Can anyone offer any help on how we can close these scaffold gaps in silico? It's seems that we should have the sequence data to get across but I don't know how to do it.
Thanks for the suggestions. The gaps are between 500-1000bp so it looks like the sequence data won't span these gaps because of the repeats in the genome. We do have a reference strain from the same species but there seems to be lot of recombination between the two genomes. I guess it's worth a look for some of the regions which look like there is no recombination. I tried AMOScmp as an alternative assembler but this produced a much large number of contigs compared with newbler.
I'll try consed and autofinish too but I'm still waiting for the software.