Visualizing K Mean Clustering Results
5
0
Entering edit mode
13.0 years ago
RB ▴ 20

Hi All

I clustered my data using Kmean clustering in R and clustered into 300 clusters. Can any one please help me how to plot these results in a scatter plot using R.

Thanks very much.

RT

visualization clustering r • 26k views
ADD COMMENT
0
Entering edit mode

What kind of data is it? How many dimensions?

ADD REPLY
0
Entering edit mode

Yep, how many label type you have?

ADD REPLY
0
Entering edit mode

it is expression data...say it as 15 samples and 10,000 genes. I clustered the data first using hierarchical clustering and got 300 clusters. Then I did the kmean clustering, giving no of clusters 300. When I use the plot function, it does not plot anything. I am new to R, Please help.

ADD REPLY
0
Entering edit mode

So, you want to plot your 10,000 genes each as a point and have them visually clustered together or colored according to which of 300 clusters they belong to? I'm not clear exactly what you want. If you want a scatterplot then you need to define x and y axes. Not clear what those would be given you have 15 samples.

ADD REPLY
4
Entering edit mode
13.0 years ago

I can suggest you to use the ADE4 package: you just have to do a factor with your K-means result:

library(ade4)
dimA<-runif(15)
dimB<-runif(15)
myData<-data.frame(dimA,dimB)
kres<-kmeans(myData,3)
plot(myData)
kmeansRes<-factor(kres$cluster)
s.class(myData,fac=kmeansRes, add.plot=TRUE, col=rainbow(nlevels(kmeansRes)))

If you have rownames (i.e. samples name), I advise to use the s.label() function instead of plot().

ADD COMMENT
4
Entering edit mode
8.1 years ago
Ron ★ 1.2k

The method is similar to what Obi used,but I used ggplot for plotting the final figure. Assuming the RNA expression data,where the Samples are columns and genes are rows.

k means clustering

library(fpc)
library(ggplot2)
kclust=kmeans(t(data),centers=3)
kclust$cluster <- as.factor(kclust$cluster)
d=dist(t(data), method = "euclidean") 
fit=cmdscale(d,eig=TRUE, k=2) # k is the number of dim

ggplot visualization

p = ggplot(data.frame(t(data)), aes(fit$points[,1], fit$points[,2], color =  factor(kclust$cluster))) 
p <- p + theme(axis.title.y = element_text(size = rel(1.5), angle = 90))
p <- p + theme(axis.title.x = element_text(size = rel(1.5), angle = 00))
p= p + theme(axis.text=element_text(size=16,angle=90),axis.title=element_text(size=20,face="bold")) + geom_point(size=4)
p= p + theme(legend.text = element_text(size = 14, colour = "black"))
p= p + theme(legend.title = element_text(size = 18, colour = "black"))
p= p  + theme(legend.key.size = unit(1.5, "cm"))
p
ADD COMMENT
0
Entering edit mode

i used your code its fine but when Im trying to plot im getting this error "Error: Aesthetics must be either length 1 or the same as the data (11): x, y, colour"

can you tell what is the issue?

ADD REPLY
0
Entering edit mode

you need to check how you have loaded your matrix, @Ron has used t(data) # transpose data, remove this and it should work.

ADD REPLY
3
Entering edit mode
12.8 years ago

What about a PCA/MDS plot? You could use the distances between genes and then color them according to which k-cluster they belong to. Try this code below. I used flexclust{kcca} instead of standard 'kmeans' function so that I could make sure the same distance metric was being used for both k-mean clustering and the MDS plot. Only thing I'm not sure about it how well it work with 300 clusters. I think no matter what it will be hard to visualize differences between that many clusters on a scatter plot.

library(flexclust)
#Imaginary data with 3 samples and 1000 genes
myData<-data.frame(sample1=runif(1000),sample2=runif(1000),sample3=runif(1000))

#Perform k-means clustering
knum=5 #Set desired number of clusters
kres=kcca(myData,k=knum, family=kccaFamily("kmeans", dist="Euclidian", cent="mean"))
cluster_assignments=kres@cluster

#Calculate distance matrix and then perform MDS/PCA
d=dist(myData, method = "euclidean") # euclidean distances between the rows
fit=cmdscale(d,eig=TRUE, k=2) # k is the number of dim

#plot solution
plot(x=fit$points[,1], y=fit$points[,2], xlab="Coordinate 1", ylab="Coordinate 2", main="MDS", type="n")
colors=rainbow(knum)[kres@cluster]
points(x=fit$points[,1], y=fit$points[,2], cex=.7, col=colors, pch=20)

MDS example

ADD COMMENT
0
Entering edit mode

I am doing k means clustering and found this method for visualizing k -means.How can I add legend to show the sample names in this plot?My data is expression data.

ADD REPLY
2
Entering edit mode
13.0 years ago
Raygozak ★ 1.4k

You can also look at this blog and what they call a clustergram to asses the clusters found

http://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/

ADD COMMENT
0
Entering edit mode
13.0 years ago
Vitis ★ 2.6k

ggplot2 package in R has very nice ways to show clusters, by plotting mean/median as lines and sd or quantiles as shades. You probably will find sample codes to do that in the manual or website. http://had.co.nz/ggplot2/

ADD COMMENT

Login before adding your answer.

Traffic: 1959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6