Clustering In Perl
2
3
Entering edit mode
13.5 years ago
Chad A. Davis ▴ 150

I'm currently using Algorithm::Cluster, which is based on the C Clustering Library, to cluster sequences and structures in Perl. Algorithm::Cluster provides many clustering facilities, including hierarchical clustering. Given the desired number of clusters, it builds the tree and cuts it. What I need, however, is a library that allows for a threshold. Something like: all of the members of one cluster are <= X distance apart, or: any two members of different clusters are >= X distance apart.

Is this possible in Algorithm::Cluster? Or is there another (Perl) module that would, given a distance matrix and a threshold, determine the appropriate number of clusters and their members?

clustering perl • 7.5k views
ADD COMMENT
5
Entering edit mode
13.5 years ago
Chad A. Davis ▴ 150

I've submitted a patch against Algorithm::Cluster to allow:

my $cluster_ids = $tree->cutthresh(3.75);

The patch adds an XS interface (i.e. the code is in C). You can find it on this bug report:

https://rt.cpan.org/Public/Bug/Display.html?id=68482

Those interested in a quick Pure Perl solution can use this example which uses some undocumented XS interfaces:

sub cutthresh {
my ($tree, $thresh) = @_;   
my @nodecluster;
my @leafcluster;
# Binary tree: number of internal nodes is 1 less than # of leafs
# Last node is the root, walking down the tree
my $icluster = 0;
# Root node belongs to cluster 0
$nodecluster[@doms-2] = $icluster++;
for (my $i = @doms-2; $i >= 0; $i--) {        
    my $node = $tree->get($i);
    say sprintf "%3d %3d %.3f", $i,$nodecluster[$i], $node->distance;
    my $left = $node->left;
    # Nodes are numbered -1,-2,... Leafs are numbered 0,1,2,...
    my $leftref = $left < 0 ? \$nodecluster[-$left-1] : \$leafcluster[$left];
    my $assigncluster = $nodecluster[$i];
    # Left is always the same as the parent node's cluster
    $$leftref = $assigncluster;
    say sprintf "\tleft  %3d %3d", $left, $$leftref;
    my $right = $node->right;
    # Put right into a new cluster, when thresh not satisfied
    if ($node->distance > $thresh) { $assigncluster = $icluster++ }
    my $rightref = $right < 0 ? \$nodecluster[-$right-1] : \$leafcluster[$right];
    $$rightref = $assigncluster;
    say sprintf "\tright %3d %3d", $right, $$rightref;
}
return @leafcluster;
}
ADD COMMENT
0
Entering edit mode

The pure Perl version of this has now been implemented as http://p3rl.org/Algorithm::Cluster::Thresh for those who are interested.

ADD REPLY
1
Entering edit mode
13.5 years ago

Do you have to use Perl? I am a huge fan of Perl but for these sort of tasks I would use R. I have used this website to learn clustering in R. You can still use Perl to connect with R if you have to. I played with RSPerl for a while but in the end it was easier for me just to use R scripts.

ADD COMMENT

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6