Clustering by sequence alignments

0

Entering edit mode

7.6 years ago

bbb ▴ 70

Which cluster method is better to use to cluster DNAs of different species based on alignment information (matches, deletions, insertion)? i.e. reference sequence - sequence of 4000 b.p. length, then feature set is 4000 * |{b.p. from reads which was matched exactly, b.p. insertions, deletions}| = 12000

dna alignment clustering • 1.6k views

ADD COMMENT • link updated 5.7 years ago by Biostar 20 • written 7.6 years ago by bbb ▴ 70

0

Entering edit mode

What about CD-HIT ??

ADD REPLY • link 7.6 years ago by Buffo ★ 2.4k

0

Entering edit mode

CD-hit is very good to remove redundancy but is not adequate for clustering. I didn't understand the question asked, though. For clustering you need a metric of similarity or distance.

ADD REPLY • link 7.6 years ago by abascalfederico ★ 1.2k

0

Entering edit mode

Starting with distance matrices, affinity propagation clustering has worked quite nicely for me.

ADD REPLY • link 7.6 years ago by 5heikki 11k

Login before adding your answer.