Sanger Paired-End Assembly
1
1
Entering edit mode
13.3 years ago

Hi. I have hundreds of paired-end sequences produced by Sanger (length ~800 bp) that I would like to assemble by pair. The overlapping region can vary from 20 bp until 150 bp. The difficulty is that some "N" in the overlapping area can occur and one strain have to be reverse transcribed. What tool should I use to address my problem? Thanks a lot.

sanger paired assembly • 6.0k views
ADD COMMENT
2
Entering edit mode
13.2 years ago

If I understand your question, you want to assemble two sequences at a time, one forward strand, and the other reverse strand, the two sequences meet in the middle. Pretty much any overlap-based assembler can do the trick. However, since there are only two sequences, you need to decide how to call consensus bases when the overlapping region of the two sequences disagree.

Important parameters are max allowed hangs, identity cutoff, and minimum overlap length. Be aware that these parameters affect the outcome, even though you are likely going to use the defaults.

                    overlap  hang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                     Forward                      
                    |||||||
                    <<<<<<<<<<<<<<<<<<<<<<<<<<<<< Reverse

For command-line programs, use EMBOSS merger or CAP3. I am not familiar with many GUI options, but have used Sequencher in the past to assemble small number of reads.

If there are substantial number of bad bases of N's towards the end of each sequence (which will prevent two sequences from merging), consider trimming based on quality.

ADD COMMENT
0
Entering edit mode

Thanks. CAP3 is perfect for this.

ADD REPLY

Login before adding your answer.

Traffic: 2068 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6