Rescaffolding a genome assembly?
1
1
Entering edit mode
7.4 years ago
cmdcolin ★ 4.0k

Hello I have found a region of a genome assembly (vertebrate genome) that, when resequencing data is aligned to it, contains a large number of discordant reads and larger-than-expected insert sizes.

It seems like this is an error in the genome assembly, where contigs are duplicated, or misordered, or something like this.

Therefore I would like to try to "rescaffold" this region and correct contig ordering, remove duplicate sequence, incorporate missing sequences, etc.

Are there any guidelines for what programs to use? Should I create an entire de-novo assembly using the resequencing and compare it to the original, or can I, for example, run specific steps of the abyss (or other program) pipeline to fix the assembly?

abyss scaffolding • 2.8k views
ADD COMMENT
2
Entering edit mode

There's nothing in the ABySS toolset for identifying/cutting misassemblies, unfortunately. I noticed that there is a tool called NxRepair that detects/cuts misassemblies using mate pair reads, but I have not tried it.

ADD REPLY
0
Entering edit mode

It may be an assembly error, it may be existing biological variation.

ADD REPLY
0
Entering edit mode

99% sure it is assembly error since all the resequencing data has the same odd pattern. But even if it was biological variation, that'd be interesting ...and maybe you'd want to resolve the variant genomic assembly in that region!

ADD REPLY
1
Entering edit mode

You can probably break the contigs where there are break-points or assembly errors into fragments. Re-align the fragmented one with medusa or other scaffolders

ADD REPLY
0
Entering edit mode

Thanks, Rohit. Do you have any other recommendations for scaffolders? I will give medusa a try anyways!

ADD REPLY
0
Entering edit mode
ADD REPLY
2
Entering edit mode
7.4 years ago
cmdcolin ★ 4.0k

I am currently considering a number of options.

  • reference based scaffolders as mentioned in comments (medusa, raca)
  • "rescaffolding" by manually breaking the scaffolds back into contigs and then re-running scaffolding tools over them
  • patching tools such as nxrepair
  • running genome analysis tools like REAPR to evaluate assembly quality, which can then clear out erroneous regions with Ns, and then running gap closing tools e.g. sealer
  • performing new de-novo assembly and compare to original assembly via blat or blast or similar.

Nothing has panned out yet since it's hard to streamline these but can update this thread if there are any results

ADD COMMENT
1
Entering edit mode

Thanks for posting the update -- I'm interested to hear how it goes for you!

ADD REPLY
0
Entering edit mode

I wasn't able to do a really comprehensive comparison of all options but the "rescaffolding" option via running SSPACE using mate pair reads on the broken up contigs produced some new scaffolds that suggested a different arrangement of contigs (it stiched together contigs that were on different scaffolds into a single scaffold) and that confirmed some suspicions about a misassembly that seemed to make sense (there was a broken gene on two different scaffolds and it pieced them together next to each other. this suggests either that the two broken genes were the same thing, or maybe possible it was a tandem duplication)

ADD REPLY
0
Entering edit mode

Good to know. Thanks for posting back!

ADD REPLY

Login before adding your answer.

Traffic: 1946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6