Question

Retrieving Best N Local Alignments

1

Entering edit mode

14.3 years ago

Guidobot ▴ 20

Can anyone point me to a program [C/Python] that runs a Smith-Waterman pairwise local alignment but (unlike EMBOSS/water) has options for one or both of these:

Return the best N hits of query sequence.
Extend the/each local alignment to produce the best full local alignment for the (much smaller) query sequence.

pairwise sequence alignment • 5.3k views

ADD COMMENT • link updated 14.1 years ago by Bill Pearson ★ 1.1k • written 14.3 years ago by Guidobot ▴ 20

2

Entering edit mode

I would consider #2 to be a global alignment

ADD REPLY • link 14.3 years ago by Jeremy Leipzig 23k

0

Entering edit mode

True, true, especially because for poorer matches the full best match can be different. However, the full alignment (w/ EMBOSS/needle) can take much longer and produces much bigger files. I guess question should have been: 2a. A Needleman-Wunsch implementation that just produces alignment and score over shorter sequence matching. 2b. A Needleman-Wunsch implementation that runs faster because an option to limit the DA window (matrix size), e.g. to 2x query length.

But I will also look to FASTA to see if it answers this too. Cheers.

ADD REPLY • link 14.3 years ago by Guidobot ▴ 20

0

Entering edit mode

Yep, exonerate is what I wanted. Love being able to format the output with --ryo.

Unfortunately there seems to be a bug with -bestn (-n) [for query vs. multiple targets] but the --percent option is closer to what I wanted. Also seems relatively slow in --exhaustive mode and (possibly because) I can't seem to disable rev.comp. query alignment.

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 14.3 years ago by Guidobot ▴ 20

1

Entering edit mode

hello Guidobot, this website works differently than a normal forum. If you want to reply to a specific answer, use the comment option. To indicate the best answer to your question, you should use the 'check this answer as the best one' option.

ADD REPLY • link 14.3 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

Thanks. I indeed added this in an attempt to mark the thread 'answered'.

ADD REPLY • link 14.3 years ago by Guidobot ▴ 20

5

Entering edit mode

14.3 years ago

Lars Juhl Jensen 11k

If I understand you correctly, point 1 in your question is just asking for a tool that can search a sequence against a sequence database using the Smith-Waterman local alignment algorithm and return be N best hits. The ssearch program from Bill Pearson's famous FASTA package can do that.

I am not sure I understand your second point? Is that not merely a matter of subsequently running a Needleman-Wunsch global alignment for the best N hits?

ADD COMMENT • link 14.3 years ago by Lars Juhl Jensen 11k

2

Entering edit mode

Sorry, but in your question you explicitly asked for a program (C/Python), not for an online resource. I do not get why you ask for that if you are not happy about having to download and build the software. Also you now ask for N best global alignments whereas in your question you asked for N best local alignments.

ADD REPLY • link 14.3 years ago by Lars Juhl Jensen 11k

0

Entering edit mode

Thanks again Jeremy. I'll look at this and see if it's what I need.

ADD REPLY • link 14.3 years ago by Guidobot ▴ 20

0

Entering edit mode

I should have mentioned that I'm working with nucleotides. The FASTA site does not allow retrieval of the best N hits (which is a pity as my PC is a bit of a clunker). It appears I would have to download/build the FASTA-36 version to solve (1). This also has ggsearch for best N global scores but this is only for protein seqs.

If I build FASTA-36 locally do you know if any executable can handle N best global DNA sequence alignments? (Sorry, doc on site is quite limited.)

ADD REPLY • link 14.3 years ago by Guidobot ▴ 20

0

Entering edit mode

Apologies if I've appeared to change the question. I am looking for a python/C program so I can test larger files, although a on-line resource would also be handy right now and I'm grateful for the information. The objective of the both (1)+(2) of my question is to get the best N global alignments. My original thinking was I could do (2) by using the best N local alignments. Cheers.

ADD REPLY • link 14.3 years ago by Guidobot ▴ 20

1

Entering edit mode

14.3 years ago

Michael.James.Clark ▴ 570

It sounds like perhaps BFAST bfast.sourceforge.net) will fill your needs. Read the paper here: http://dx.doi.org/10.1371/journal.pone.0007767

ADD COMMENT • link 14.3 years ago by Michael.James.Clark ▴ 570

0

Entering edit mode

Thanks Michael. This looks great for what I need further down the line. Right now I'm looking to pairwise compare sets of sequences where small sequences (45bp) are expected to hit targets multiple times, but with relatively high numbers of short INDEL and mismatch read errors. A small program that uses NW to return N best alignments would be ideal.

ADD REPLY • link 14.3 years ago by Guidobot ▴ 20

0

Entering edit mode

Ah sorry, I guess I misunderstood. Looks like exonerate might be just what you need then! Good luck!

ADD REPLY • link 14.3 years ago by Michael.James.Clark ▴ 570

1

Entering edit mode

13.1 years ago

Bill Pearson ★ 1.1k

I am also confused by the question. The ssearch36 program, available at fasta.bioch.virginia.edu/ will compare two sequences and give you local alignments a "query" sequence with a "target" sequence; you should be able to see an arbitrary number of alignments by modifying the expect threshold. ssearch36 will make certain that no part of the "target" sequence is "re-used". If you select the "compare two sequences" button from the ssearch library search page, you can compare two DNA sequences (you can only search protein libraries).

Alternatively, the lalign36 program can also show multiple non-overlapping alignments, but lalign is capable of reusing the target sequence. Thus, if a protein (or DNA sequence) contains four homologous domains A1 A2 A3 A4, then all possible significant alignments, e.g. A1 A2 A3 : A2 A3 A4, A1 A2 : A3 A4, and A1:A4 are shown (A1 A2 A3 A4 : A1 A2 A3 A4 is not shown, because that is the identity alignment). lalign36 also works with either protein or DNA sequences.

I do not understand the question about extending the alignment outside the local region.

ADD COMMENT • link 13.1 years ago by Bill Pearson ★ 1.1k

Ram · Accepted Answer · 2011-01-27

5

Entering edit mode

14.3 years ago

Giovanni M Dall'Olio 28k

I like exonerate, which can be run with the affine:local and -n options to get what you want. It is a nice tool that runs from the command line and does not need to format databases or further configurations

$: exonerate -q querysequence.txt -t targetsequences.txt -m affine:local -n 3 # n=3 to get the 3 best alignments.

ADD COMMENT • link updated 5.7 years ago by Ram 45k • written 14.3 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

Wow. Thanks! I checked out the man page and this looks perfect, and then some.

ADD REPLY • link 14.3 years ago by Guidobot ▴ 20

0

Entering edit mode

The affine:best (N) alignments was what I needed. In fact affine:global appears to be of no use for small query sequences searches. Have to be real careful with parameters, e.g. runs appear to go on indefinitely if you use +ve gapopen values. The -bestn option does not appear to be working as advertised.

ADD REPLY • link 14.3 years ago by Guidobot ▴ 20