Question

Sequence Alignment Algorithm For Phred Phrap

1

Entering edit mode

13.9 years ago

Higherdefender ▴ 140

Hi I am currently doing research on speeding up of sequence alignment. Could you let me know which algorithm(or the base algorithm) is used for sequence ALIGNMENT (not assembly) in phred phrap.?

Could you also let me know what are the common sequence alignment algorithms in use now-a-days?

alignment algorithm • 4.5k views

ADD COMMENT • link updated 13.8 years ago by brentp 24k • written 13.9 years ago by Higherdefender ▴ 140

2

Entering edit mode

Have you already searched on the Web and in the literature for relevant information? That's an important part of "doing research".

ADD REPLY • link 13.9 years ago by Neilfws 49k

1

Entering edit mode

Phred doesn't do alignment, and phrap was the cutting edge back in...the late nineties? Current algorithms typically focus on short reads using compressed suffix arrays (bfindex et al), but for many purposes, old workhorses like BLAST or BLAT are still commonly used.

ADD REPLY • link 13.9 years ago by Ketil 4.1k

0

Entering edit mode

thanks Ketil. Could you put this as an answer?

ADD REPLY • link 13.9 years ago by Higherdefender ▴ 140

score 3 · Answer 1 · 2011-02-17

3

Entering edit mode

13.9 years ago

brentp 24k

take a look at this paper for a good summary.

A survey of sequence alignment algorithms for next-generation sequencing

ADD COMMENT • link 13.9 years ago by brentp 24k

Ram · Answer 2 · 2011-02-17

My understanding is that Phrap is based on Cross_match/SWAT, which are implementations of the Smith-Waterman local alignment algorithm. Cross_match is based on SWAT with banding constraints.

From: http://www.phrap.org/phredphrap/general.html

Swat and cross_match are programs for rapid protein and nucleic acid sequence comparison and database search, based on an efficient implementation of the Smith-Waterman-Gotoh algorithm (Smith & Waterman 1981, Gotoh 1982). By revising the recursion relations slightly and making efficient use of word-packing, we have been able to significantly reduce the number of machine instructions executed per Smith-Waterman matrix cell, resulting in a speed enhancement of 2-3 fold over the previous best implementation (Pearson and Miller, 1992). Swat is about 1 / 10 as fast as BLAST, making it quite adequate for performing database searches on a workstation. Statistical significance of the hits is evaluated based on the empirical score distribution for the search, taking into account dependence of the score variance and mean on the length of the database sequence.

Cross_match uses the same algorithm as swat, but in addition allows the comparison of a pair of sequences to be constrained to bands of the Smith-Waterman matrix that surround one or more matching words in the sequences. This substantially increases speed for large-scale nucleotide sequence comparisons without significantly compromising sensitivity. We have found cross_match useful for the following tasks: comparing a set of reads to a set of vector sequences to produce vector-masked versions of the reads; comparing contig sequences found by two alternative assembly procedures (e.g. phrap and xbap) to each other; and comparing assembled contigs to the final edited cosmid sequence, in retrospective studies of the accuracy of fully automated assembly. We also now routinely use cross_match to find repeats in finished cosmid sequences and mask them out prior to database searches.

score 0 · Answer 3 · 2011-02-17

0

Entering edit mode

13.9 years ago

Mara ▴ 70

Clustal and Muscle are common sequence alignment algorithms implemented in many other programs. These are particularly good for multiple sequence alignments containing many sequences. http://www.clustal.org/ http://www.drive5.com/muscle/

ADD COMMENT • link 13.9 years ago by Mara ▴ 70