Question

SNP calling for correcting errors ?

1

Entering edit mode

8.9 years ago

marcela.uliano ▴ 90

Hey guys,

Let's say one has scaffolded a draft genome with PacBio subreads and PB Jelly, but still is concerned about not enough coverage of Illumina contigs, and the maintenance of PacBio indel errors in the final draft.

This person also have high quality RNA-seq illumina and can map this to the draft genome. One could use a SNP caller to, instead of calling SNPs, to evaluate errors in the draft genome?

Taking into consideration the species for which the draft genome is available is diploid, and that the one would only align RNA-seq of one individual (one sample).

Or do you guys know any other pos-assembly draft error-correctors?

Thank you, guys!

RNA-Seq genome SNP Assembly alignment • 2.8k views

ADD COMMENT • link updated 7.4 years ago by tjduncan ▴ 280 • written 8.9 years ago by marcela.uliano ▴ 90

3

Entering edit mode

Google: "pilon broad".

ADD REPLY • link 8.9 years ago by lh3 33k

0

Entering edit mode

Thank you Ih3 and ALchEmiXt,

Running PILON iteractively is exactly what I'm doing and its working great! I notice by the number of CEGs I got in the draft genome previous and post PILON.

Thanks guys!

ADD REPLY • link 8.8 years ago by marcela.uliano ▴ 90

score 0 · Answer 1 · 2016-06-14

As mentioned you could use pilon as a curation tool to polish the assemblies.

We actually use pilon in iterated mode since we noticed that using different technologies not all curatable changes are polished the first round. We usually run them in a short loop of max 4 -6 iterations taking the output of each iteration of pilon as input to the next. Worked quite well for us.

Another way we tend to do it is by using mapping with bowtie2 or bwa and use samtools to generate and extract a consensus. This consensus is next used for another iteration. Usually (depending on quality) this consensus bulding is saturated at 5 iterations. the latter can also be used to generate your own consensus from a closely related sequence.... (your milage may vary though depneding how close the sequence is).

score 0 · Answer 2 · 2017-12-14

0

Entering edit mode

7.4 years ago

tjduncan ▴ 280

The Hercules package that hit biorxiv recently may be perfect for this. It is a profile HMM-based hybrid error correction algorithm for long reads.

https://www.biorxiv.org/content/early/2017/12/13/233080

https://github.com/BilkentCompGen/Hercules

ADD COMMENT • link 7.4 years ago by tjduncan ▴ 280