Question

How can I fix frameshift errors in assembled genome?

1

Entering edit mode

4.7 years ago

star715 ▴ 50

I have a genome assembled from pacbio reads. However I have noticed many frameshift mutations in the genome due to which some genes are not annotated even if they are present in the genome. How would I be able to fix it?

Assembly genome polishing pacbio • 1.3k views

ADD COMMENT • link updated 4.7 years ago by predeus ★ 2.1k • written 4.7 years ago by star715 ▴ 50

0

Entering edit mode

You can use Quiver (if you have old RS reads) or Arrow (Sequel reads) to polish your assembly. You can also give Pilon a try. The newest version supports long reads.

ADD REPLY • link 4.7 years ago by alex.zaccaron ▴ 470

score 0 · Answer 1 · 2020-03-19

0

Entering edit mode

4.7 years ago

predeus ★ 2.1k

What kind of a genome is it, and what kind of assembler have you used? Normally, PacBio-only assembly should not contain many errors, since assemblers polish the assembly before it's finalized.

At any rate, you can polish the assembly using Racon (https://github.com/isovic/racon), or a built-in polisher from flye assembler (https://github.com/fenderglass/Flye).

ADD COMMENT • link 4.7 years ago by predeus ★ 2.1k

0

Entering edit mode

It is pacbio RSII data and I have been using hgap3/hgap4 to assemble it. I tried with flye but it gave more frameshift errors. Can pilon/Racon work with only pacbio data and no short reads?

ADD REPLY • link 4.7 years ago by star715 ▴ 50

0

Entering edit mode

yes, both work fine with long reads only.

Why are you so sure these are in fact errors? Did you look at the alignments in a genome browser, e.g. IGV? Visual inspection can often be misleading - you need a pileup and some sort of summary (BAM + BAI index in IGV help with this nicely - you'll see variants in coverage right away).

ADD REPLY • link 4.7 years ago by predeus ★ 2.1k