I want to compare two sequences and see how similar they are. I'm thinking of doing a two sequence alignment. I can do this one at a time using CLUSTALW but there seems no score/measure of how good the alignment is. Also I have hundreds of sequence pairs so I need a tool to handle all the pairs.
I'm wondering if anyone have recommendations on the available tools to use?
Thanks!
Thanks for those tool recommendations !
I have found FASTA and tried it. It outputs a similarity score and percentage identity and a visual alignment. I think it's good enough for my purpose.
I do have a question on your answer. My sequences are kind of random sequences. How do I decide if it's local/local or global/global or global/local pairwise alignment?
Thanks!
The choice between local/local, global/global and global/local is driven by the nature of the sequences.
Global/global (or just plain 'global') aligns the sequences from end-to-end, and so suits cases where you expect the sequences to be similar over their whole length and they are co-linear (i.e. little to no rearrangement). This works well with closely related sequences.
Local/local (or just plain 'local') finds regions of similarity, and thus copes with rearrangements and alignments with sub-sequences/super-sequences since it does not require end-to-end similarity. This flexibility is why general purpose sequence similarity search methods, such as BLAST and FASTA, use local pairwise alignments (and some nifty statistics) to find database sequences which are similar to a query sequence. The down side of local alignments, is that they are local. While gaping, drop-off and HSPs mitigate this, there are still cases where the local alignment provides insufficient overlap between the two sequences.
Global/local provides a hybrid option which gives end-to-end coverage in one sequence (for GLSEARCH this is the query) while the other sequence need only provide a local region. This is great when searching with sequence fragments, since it provides complete coverage of the query while allowing for the hit to be much longer than the query. This approach is also used in sequence mapping tools, where the aim is to map a short(er) sequence on to a long(er) sequence.
So for a rule of thumb:
For a first pass, you are probably best to go with local alignments. Then depending on those results you may want to examine a subset using global or global/local alignment.