454 Gap Closures
3
1
Entering edit mode
14.0 years ago
Lee Katz ★ 3.2k

Is there a way to estimate how large a gap might be after performing 454 pyrosequencing followed by Newbler? I have several closed reference genomes, and I know that my read length is about 400bp with about 40x coverage. Therefore I have high confidence that these gaps are due to repeat regions.

edit I guess this might be answerable by knowing the repeat regions in a genome. How would I identify repeat regions just by the sequence alone? If I knew this length then I would take the repeat region length, L and calculate it by gapLength = L-(2 * 400).

assembly • 3.4k views
ADD COMMENT
0
Entering edit mode

Is it an eukariotic or bacterial genome?

ADD REPLY
0
Entering edit mode

bacterial genome

ADD REPLY
3
Entering edit mode
14.0 years ago
Darked89 4.7k

Repeat identification:

http://openwetware.org/wiki/Wikiomics:Repeat_finding

Newbler does not kick out all repetitive sequences from assembly, at least not in all settings. I got the repeats in a Newbler-assembled plant genome.

ADD COMMENT
1
Entering edit mode

For the fire ant genome, we noticed dramatic improvements in newbler performance when we increased parameter stringency to minimum 100bp overlap between reads and 98 or 99% identity. The best explanation I see is that those parameters helped resolve repeats (because of stringent parameters... many old repeats became unique sequences...)

ADD REPLY
0
Entering edit mode

Thanks for the info!

ADD REPLY
1
Entering edit mode
14.0 years ago
Ketil 4.1k

I think that even with 40x coverage, you're not guaranteed to have reads covering all gaps, and the theoretical models don't work so well in practice. I don't know any exact numbers for this (and it probably varies from run to run and lab to lab), but coverage tends to be uneven, and there could be features of the sequence that makes some parts rare or unsequenceable. It's well known that you get duplicated clones (the same clone on multiple beads), which is one form of unevenness.

ADD COMMENT
0
Entering edit mode

Newbler takes care of these duplicates: it identifies them, does not remove them, but when t for example deteermines the consensus bases, the duplicates count for one (same for average read depth).

ADD REPLY
1
Entering edit mode
14.0 years ago
lexnederbragt ★ 1.3k

I assume you have shotgun reads only? For newbler assemblies, you can actually find the repeats among the contigs by looking at the per-contig read depth. With apologies for the self-promotion, here is a paper describing just that: http://www.hindawi.com/journals/seq/2010/782465.html. Contigs with higher-than-normal read depth are collapsed repeats, and the depth is proportional to the copy number.

This will at least tell you what (contigs) the repeats are. Looking at the 454ContigGraph file could tell you which contigs the 'neighbours' of the repeats are.

ADD COMMENT

Login before adding your answer.

Traffic: 2119 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6