How To Close Gaps In A 454 Assembly In Silico?
4
2
Entering edit mode
14.6 years ago
Michael Barton ★ 1.9k

We've sequenced two ~7-9Mbp microbial genomes using 454 which was subsequently assembled with newbler. For the first bacteria we have 8 sequence scaffolds. These scaffolds contain gap regions which I assumed were the result of when the sequencing coverage dropped off. However when I look at the read depth for these regions the contig appears to terminate prematurely while there is still a large amount of read depth. I assume that these reads could still continue off the end of the contig but they have been ignored. I've been reading the newbler documentation and it seems to indicate that contig extension stops when there are repeats in the genome.

Can anyone offer any help on how we can close these scaffold gaps in silico? It's seems that we should have the sequence data to get across but I don't know how to do it.

sequencing genome assembly • 6.9k views
ADD COMMENT
5
Entering edit mode
14.6 years ago
Bioinfo ▴ 330

Michael,

Welcome to the wonderful world of genome finishing. If the repeats are longer than the length of a read (300-600) for flx titanium (ballpark), you will not be able to span it. These areas may also be caused by homopolymer issues that this platform suffers from, or other mysterious artifacts. One option it to use software like CONSED or CLC Bio to visualize the areas, and work your way into the repeats by finding reads that are anchored in unique sequencer. Designing primers that span the areas and using Sanger sequencing may also be helpful. I assume you don't have a reference of any type to use in piecing things together?

You can also run a differ assembler and then do a mummer mapping to see if any of the areas were taken care of by the other assembler, you would be amazed at how different assemblers handle the same data differently.

ADD COMMENT
0
Entering edit mode

Thanks for the suggestions. The gaps are between 500-1000bp so it looks like the sequence data won't span these gaps because of the repeats in the genome. We do have a reference strain from the same species but there seems to be lot of recombination between the two genomes. I guess it's worth a look for some of the regions which look like there is no recombination. I tried AMOScmp as an alternative assembler but this produced a much large number of contigs compared with newbler.

I'll try consed and autofinish too but I'm still waiting for the software.

ADD REPLY
2
Entering edit mode
14.6 years ago
Wjeck ▴ 490

Generally these gaps are very tricky to span, even with 454 reads, using in silico techniques only. You might have to try the wet bench solution to this, which is to use illumina PE reads with a large "insert" size to create a scaffold that jumps those gaps.

There's this project using that technique (shameless self promotion):

http://www.ncbi.nlm.nih.gov.libproxy.lib.unc.edu/pubmed/19015323

But I think others have made considerable improvements since then.

ADD COMMENT
0
Entering edit mode

Thanks for the suggestions. We're considering SRS for a second genome we have which is even more fragmented >50 contigs at X17 coverage. Probably a large number of repeats ...

ADD REPLY
2
Entering edit mode
14.5 years ago
lexnederbragt ★ 1.3k

In this PDF:

http://www.jgi.doe.gov/News/primer/primer09fall.pdf

on page 2, there is a program mentioned to close gaps in 454 assemblies. We tried it out on a bacterial genome, and it seems to work for a subset of the gaps in the scaffolds. We are currently quality checking the closed gaps...

ADD COMMENT
0
Entering edit mode

Thanks. That looks useful. How are you quality checking the gaps?

ADD REPLY
0
Entering edit mode

If you really must know :-) we have early access to the graph viewer, and use that to check which contigs (according to the graph) could (should) fit in the gap and align their sequences to the proposed gap-closing sequence. In addition, we did some gap-closing PCRs before and check with their sequence. Finally, we are considering checking a bunch of them with new PCRs.

ADD REPLY
2
Entering edit mode
14.5 years ago
User 59 13k

There's also an approach for generating gap spanning contigs by aligning sequences at the contig ends and performing local assemblies.

http://genomebiology.com/2010/11/4/R41

"Advances in sequencing technology allow genomes to be sequenced at vastly decreased costs. However, the assembled data frequently are highly fragmented with many gaps. We present a practical approach that uses Illumina sequences to improve draft genome assemblies by aligning sequences against contig ends and performing local assemblies to produce gap-spanning contigs. The continuity of a draft genome can thus be substantially improved, often without the need to generate new data."

ADD COMMENT
0
Entering edit mode

Just been looking at IMAGE and I think it's specifically focused towards closing gaps in Illumina sequencing data.

ADD REPLY

Login before adding your answer.

Traffic: 2796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6