Correcting sequence reads based on ORF disruption from reference sequence.
1
0
Entering edit mode
10.5 years ago
Daniel ★ 4.0k

I have a set of 454 metagenomic (amplicon) reads from which I am attempting to extract a particular taxon group.

I have done the following:

  • Select reference sequences for key taxa family (NCBI and some of our own).
  • Create reference alignment using muscle.
  • Align NGS query sequences to reference alignment using pynast, keeping those sequences which align closely and discarding non related taxa.

I now want to try and correct the extracted sequences based on the ORF of the functional gene which we are using (rbcl). I can see a homopolymer which is causing an insertion and others errors, but is there a way to correct the sequences?

Thanks

error-correction alignment next-gen • 2.6k views
ADD COMMENT
1
Entering edit mode
10.5 years ago
Daniel ★ 4.0k

I have found the RDP tool 'FrameBot' to be quite good in this respect. Takes a bit of tweaking and it seems unwilling to remove some insertions but it's the best I've come across so far

https://github.com/rdpstaff/RDPTools

RDP FrameBot is a frameshift correction and nearest neighbor classification tool for use with high-throughput amplicon sequencing. It uses a dynamic programming algorithm to align each query DNA sequence against a set of target protein sequences, produces frameshift-corrected protein and DNA sequences and an optimal global or local protein alignment. It also helps filter out non-target reads. The online version of FrameBot is available on http://fungene.cme.msu.edu/FunGenePipeline. Read the quick tutorial at http://rdp.cme.msu.edu/tutorials/framebot/RDPtutorial_FRAMEBOT.html before you start. 
ADD COMMENT

Login before adding your answer.

Traffic: 2366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6