Can anyone point me to a program [C/Python] that runs a Smith-Waterman pairwise local alignment but (unlike EMBOSS/water) has options for one or both of these:
Return the best N hits of query sequence.
Extend the/each local alignment to produce the best full local alignment for the (much smaller) query sequence.
True, true, especially because for poorer matches the full best match can be different. However, the full alignment (w/ EMBOSS/needle) can take much longer and produces much bigger files.
I guess question should have been:
2a. A Needleman-Wunsch implementation that just produces alignment and score over shorter sequence matching.
2b. A Needleman-Wunsch implementation that runs faster because an option to limit the DA window (matrix size), e.g. to 2x query length.
But I will also look to FASTA to see if it answers this too. Cheers.
Yep, exonerate is what I wanted. Love being able to format the output with --ryo.
Unfortunately there seems to be a bug with -bestn (-n) [for query vs. multiple targets] but the --percent option is closer to what I wanted. Also seems relatively slow in --exhaustive mode and (possibly because) I can't seem to disable rev.comp. query alignment.
ADD REPLY
• link
updated 5.3 years ago by
Ram
44k
•
written 13.9 years ago by
Guidobot
▴
20
1
Entering edit mode
hello Guidobot, this website works differently than a normal forum. If you want to reply to a specific answer, use the comment option. To indicate the best answer to your question, you should use the 'check this answer as the best one' option.
I like exonerate, which can be run with the affine:local and -n options to get what you want. It is a nice tool that runs from the command line and does not need to format databases or further configurations
$: exonerate -q querysequence.txt -t targetsequences.txt -m affine:local -n 3 # n=3 to get the 3 best alignments.
The affine:best (N) alignments was what I needed. In fact affine:global appears to be of no use for small query sequences searches. Have to be real careful with parameters, e.g. runs appear to go on indefinitely if you use +ve gapopen values. The -bestn option does not appear to be working as advertised.
If I understand you correctly, point 1 in your question is just asking for a tool that can search a sequence against a sequence database using the Smith-Waterman local alignment algorithm and return be N best hits. The ssearch program from Bill Pearson's famous FASTA package can do that.
I am not sure I understand your second point? Is that not merely a matter of subsequently running a Needleman-Wunsch global alignment for the best N hits?
Sorry, but in your question you explicitly asked for a program (C/Python), not for an online resource. I do not get why you ask for that if you are not happy about having to download and build the software. Also you now ask for N best global alignments whereas in your question you asked for N best local alignments.
I should have mentioned that I'm working with nucleotides. The FASTA site does not allow retrieval of the best N hits (which is a pity as my PC is a bit of a clunker). It appears I would have to download/build the FASTA-36 version to solve (1). This also has ggsearch for best N global scores but this is only for protein seqs.
If I build FASTA-36 locally do you know if any executable can handle N best global DNA sequence alignments? (Sorry, doc on site is quite limited.)
Apologies if I've appeared to change the question. I am looking for a python/C program so I can test larger files, although a on-line resource would also be handy right now and I'm grateful for the information. The objective of the both (1)+(2) of my question is to get the best N global alignments. My original thinking was I could do (2) by using the best N local alignments. Cheers.
Thanks Michael. This looks great for what I need further down the line. Right now I'm looking to pairwise compare sets of sequences where small sequences (45bp) are expected to hit targets multiple times, but with relatively high numbers of short INDEL and mismatch read errors. A small program that uses NW to return N best alignments would be ideal.
I am also confused by the question. The ssearch36 program, available at fasta.bioch.virginia.edu/ will compare two sequences and give you local alignments a "query" sequence with a "target" sequence; you should be able to see an arbitrary number of alignments by modifying the expect threshold. ssearch36 will make certain that no part of the "target" sequence is "re-used". If you select the "compare two sequences" button from the ssearch library search page, you can compare two DNA sequences (you can only search protein libraries).
Alternatively, the lalign36 program can also show multiple non-overlapping alignments, but lalign is capable of reusing the target sequence. Thus, if a protein (or DNA sequence) contains four homologous domains A1 A2 A3 A4, then all possible significant alignments, e.g. A1 A2 A3 : A2 A3 A4, A1 A2 : A3 A4, and A1:A4 are shown (A1 A2 A3 A4 : A1 A2 A3 A4 is not shown, because that is the identity alignment). lalign36 also works with either protein or DNA sequences.
I do not understand the question about extending the alignment outside the local region.
I would consider #2 to be a global alignment
True, true, especially because for poorer matches the full best match can be different. However, the full alignment (w/ EMBOSS/needle) can take much longer and produces much bigger files. I guess question should have been: 2a. A Needleman-Wunsch implementation that just produces alignment and score over shorter sequence matching. 2b. A Needleman-Wunsch implementation that runs faster because an option to limit the DA window (matrix size), e.g. to 2x query length.
But I will also look to FASTA to see if it answers this too. Cheers.
Yep, exonerate is what I wanted. Love being able to format the output with
--ryo
.Unfortunately there seems to be a bug with
-bestn
(-n
) [for query vs. multiple targets] but the--percent
option is closer to what I wanted. Also seems relatively slow in--exhaustive
mode and (possibly because) I can't seem to disable rev.comp. query alignment.hello Guidobot, this website works differently than a normal forum. If you want to reply to a specific answer, use the comment option. To indicate the best answer to your question, you should use the 'check this answer as the best one' option.
Thanks. I indeed added this in an attempt to mark the thread 'answered'.