I am looking for a tool for "clustering + cleaning", Clustal is cool, but it does not separate the alignments in clusters
I am looking for a tool for "clustering + cleaning", Clustal is cool, but it does not separate the alignments in clusters
Not sure what you mean by 'cleaning'. Redundancy reduction in terms of sequence similarity? A common tool for protein sequences is CD-HIT which does exactly that and also returns sequence clusters. Since you forgot to mention which sequences you'd like to clean (aa or genomic?) I'm not sure if that helps you.
Chris
If you are looking to identify pools of similar sequences you should have a look at the answers to this BioStar question "How to find sequences with a given number of mismatches". Even if you are looking just to exclude non-matching sequences (rather than have a very gapped alignment as Clustal sometimes produces) these approaches will help you. Play with cutoffs until you have just the "matching" sequences aligned. You can even re-align in your favourite standard alignment programme afterwards if you want.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
if you can use clustal, you can build a tree.