Is There Any Program That, Given A Set Of Sequences, Makes Agglomerative Clustering And Removes The Sequences That Do Not Align?
2
2
Entering edit mode
14.1 years ago
Eminencegrise ▴ 210

I am looking for a tool for "clustering + cleaning", Clustal is cool, but it does not separate the alignments in clusters

clustering multiple • 2.9k views
ADD COMMENT
0
Entering edit mode

if you can use clustal, you can build a tree.

ADD REPLY
3
Entering edit mode
14.1 years ago
Chris ★ 1.6k

Not sure what you mean by 'cleaning'. Redundancy reduction in terms of sequence similarity? A common tool for protein sequences is CD-HIT which does exactly that and also returns sequence clusters. Since you forgot to mention which sequences you'd like to clean (aa or genomic?) I'm not sure if that helps you.

Chris

ADD COMMENT
1
Entering edit mode
14.1 years ago
Dave Lunt ★ 2.0k

If you are looking to identify pools of similar sequences you should have a look at the answers to this BioStar question "How to find sequences with a given number of mismatches". Even if you are looking just to exclude non-matching sequences (rather than have a very gapped alignment as Clustal sometimes produces) these approaches will help you. Play with cutoffs until you have just the "matching" sequences aligned. You can even re-align in your favourite standard alignment programme afterwards if you want.

ADD COMMENT

Login before adding your answer.

Traffic: 1938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6