Very simple question – I have a list of, say, 1000 protein sequences in a fasta file. I want to compute percent identities for each pair of sequences. So, in this example, it'd generated a 1000 by 1000 matrix reflected over the diagonal.
I've tried diamond and blastp, but even when I set -max-target-seqs to 0 or set the evalue/min percent identity to appropriately high/low values, they still don't report all possible alignments. There is some sort of filtering still going on, it would appear. Hopefully I'm just missing some parameter?
Any recommendations for how to get this done quickly? I'd hard code it myself but I'd rather have one of these algorithms do it for me, as they'll presumably be faster at full scale.
Thanks in advance.
Thank you – ended up going the CLUSTAL route with some basic multi-threading to speed things up. Very much appreciate your help!
No problem ;)
CLUSTAL and MAFFT are Multiple Sequence Alignment software that they perform stringent DP (edit: jrj.healey said CLUSTAL's alignment is a bit janky Its alignment may not be stringent one...) same as water and needle when they are used as pairwise aligner. I mean, I think CLUSTAL and MAFFT are over spec. But it is ok because the results would be the same when the same parameters were used, i guess.
Clustal begins by performing all vs all pairwise alignments anyway, but other threads on the site have shown that its pairwise alignments can be a bit janky. Using
needle
orwater
directly is likely a good approach too. In my experience, MAFFT gives better DNA alignments than does CLUSTAL anyway.(Those were just the first 2 to come to mind, there are many other aligners to experiment with :) )
Oh, really. I didn't know that. Thank you for the comment.
I'm mainly referring to this one: what is the problem with using clustal to do pairwise alignment?
But its far from a systematic evaluation, so YMMV.
http://msb.embopress.org/content/7/1/539
It seems that Clustal Omega uses HHalign.
ClustalW is using DP by default.
http://www.clustal.org/download/clustalw_help.txt
Ah ok, thank you. Will give the others you recommended a try as well if need be. Really appreciate all the help!