Sequence Alignment Algorithm For Phred Phrap
3
1
Entering edit mode
13.8 years ago

Hi I am currently doing research on speeding up of sequence alignment. Could you let me know which algorithm(or the base algorithm) is used for sequence ALIGNMENT (not assembly) in phred phrap.?

Could you also let me know what are the common sequence alignment algorithms in use now-a-days?

alignment algorithm • 4.5k views
ADD COMMENT
2
Entering edit mode

Have you already searched on the Web and in the literature for relevant information? That's an important part of "doing research".

ADD REPLY
1
Entering edit mode

Phred doesn't do alignment, and phrap was the cutting edge back in...the late nineties? Current algorithms typically focus on short reads using compressed suffix arrays (bfindex et al), but for many purposes, old workhorses like BLAST or BLAT are still commonly used.

ADD REPLY
0
Entering edit mode

thanks Ketil. Could you put this as an answer?

ADD REPLY
3
Entering edit mode
13.8 years ago
brentp 24k

take a look at this paper for a good summary.

A survey of sequence alignment algorithms for next-generation sequencing

ADD COMMENT
2
Entering edit mode
13.8 years ago

My understanding is that Phrap is based on Cross_match/SWAT, which are implementations of the Smith-Waterman local alignment algorithm. Cross_match is based on SWAT with banding constraints.

From: http://www.phrap.org/phredphrap/general.html

Swat and cross_match are programs for rapid protein and nucleic acid sequence comparison and database search, based on an efficient implementation of the Smith-Waterman-Gotoh algorithm (Smith & Waterman 1981, Gotoh 1982). By revising the recursion relations slightly and making efficient use of word-packing, we have been able to significantly reduce the number of machine instructions executed per Smith-Waterman matrix cell, resulting in a speed enhancement of 2-3 fold over the previous best implementation (Pearson and Miller, 1992). Swat is about 1 / 10 as fast as BLAST, making it quite adequate for performing database searches on a workstation. Statistical significance of the hits is evaluated based on the empirical score distribution for the search, taking into account dependence of the score variance and mean on the length of the database sequence.

Cross_match uses the same algorithm as swat, but in addition allows the comparison of a pair of sequences to be constrained to bands of the Smith-Waterman matrix that surround one or more matching words in the sequences. This substantially increases speed for large-scale nucleotide sequence comparisons without significantly compromising sensitivity. We have found cross_match useful for the following tasks: comparing a set of reads to a set of vector sequences to produce vector-masked versions of the reads; comparing contig sequences found by two alternative assembly procedures (e.g. phrap and xbap) to each other; and comparing assembled contigs to the final edited cosmid sequence, in retrospective studies of the accuracy of fully automated assembly. We also now routinely use cross_match to find repeats in finished cosmid sequences and mask them out prior to database searches.

ADD COMMENT
2
Entering edit mode

Try writing that comment again; it makes no sense as written :-)

ADD REPLY
0
Entering edit mode

could you also let me know some commonly used algorithms for sequence algorithms?

ADD REPLY
0
Entering edit mode

My recollection from days in a high-throughput sequencing lab match what Casey writes - Cross_match it is.

ADD REPLY
0
Entering edit mode
13.8 years ago
Mara ▴ 70

Clustal and Muscle are common sequence alignment algorithms implemented in many other programs. These are particularly good for multiple sequence alignments containing many sequences. http://www.clustal.org/ http://www.drive5.com/muscle/

ADD COMMENT

Login before adding your answer.

Traffic: 1886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6