Question

Correcting sequence reads based on ORF disruption from reference sequence.

0

Entering edit mode

10.5 years ago

Daniel ★ 4.0k

I have a set of 454 metagenomic (amplicon) reads from which I am attempting to extract a particular taxon group.

I have done the following:

Select reference sequences for key taxa family (NCBI and some of our own).
Create reference alignment using muscle.
Align NGS query sequences to reference alignment using pynast, keeping those sequences which align closely and discarding non related taxa.

I now want to try and correct the extracted sequences based on the ORF of the functional gene which we are using (rbcl). I can see a homopolymer which is causing an insertion and others errors, but is there a way to correct the sequences?

Thanks

error-correction alignment next-gen • 2.6k views

ADD COMMENT • link updated 3.1 years ago by Ram 44k • written 10.5 years ago by Daniel ★ 4.0k

score 1 · Accepted Answer · 2014-06-06

I have found the RDP tool 'FrameBot' to be quite good in this respect. Takes a bit of tweaking and it seems unwilling to remove some insertions but it's the best I've come across so far

https://github.com/rdpstaff/RDPTools

RDP FrameBot is a frameshift correction and nearest neighbor classification tool for use with high-throughput amplicon sequencing. It uses a dynamic programming algorithm to align each query DNA sequence against a set of target protein sequences, produces frameshift-corrected protein and DNA sequences and an optimal global or local protein alignment. It also helps filter out non-target reads. The online version of FrameBot is available on http://fungene.cme.msu.edu/FunGenePipeline. Read the quick tutorial at http://rdp.cme.msu.edu/tutorials/framebot/RDPtutorial_FRAMEBOT.html before you start.