Hi All
I clustered my data using Kmean clustering in R and clustered into 300 clusters. Can any one please help me how to plot these results in a scatter plot using R.
Thanks very much.
RT
Hi All
I clustered my data using Kmean clustering in R and clustered into 300 clusters. Can any one please help me how to plot these results in a scatter plot using R.
Thanks very much.
RT
I can suggest you to use the ADE4 package: you just have to do a factor with your K-means result:
library(ade4)
dimA<-runif(15)
dimB<-runif(15)
myData<-data.frame(dimA,dimB)
kres<-kmeans(myData,3)
plot(myData)
kmeansRes<-factor(kres$cluster)
s.class(myData,fac=kmeansRes, add.plot=TRUE, col=rainbow(nlevels(kmeansRes)))
If you have rownames (i.e. samples name), I advise to use the s.label() function instead of plot().
The method is similar to what Obi used,but I used ggplot for plotting the final figure. Assuming the RNA expression data,where the Samples are columns and genes are rows.
library(fpc)
library(ggplot2)
kclust=kmeans(t(data),centers=3)
kclust$cluster <- as.factor(kclust$cluster)
d=dist(t(data), method = "euclidean")
fit=cmdscale(d,eig=TRUE, k=2) # k is the number of dim
p = ggplot(data.frame(t(data)), aes(fit$points[,1], fit$points[,2], color = factor(kclust$cluster)))
p <- p + theme(axis.title.y = element_text(size = rel(1.5), angle = 90))
p <- p + theme(axis.title.x = element_text(size = rel(1.5), angle = 00))
p= p + theme(axis.text=element_text(size=16,angle=90),axis.title=element_text(size=20,face="bold")) + geom_point(size=4)
p= p + theme(legend.text = element_text(size = 14, colour = "black"))
p= p + theme(legend.title = element_text(size = 18, colour = "black"))
p= p + theme(legend.key.size = unit(1.5, "cm"))
p
What about a PCA/MDS plot? You could use the distances between genes and then color them according to which k-cluster they belong to. Try this code below. I used flexclust{kcca} instead of standard 'kmeans' function so that I could make sure the same distance metric was being used for both k-mean clustering and the MDS plot. Only thing I'm not sure about it how well it work with 300 clusters. I think no matter what it will be hard to visualize differences between that many clusters on a scatter plot.
library(flexclust)
#Imaginary data with 3 samples and 1000 genes
myData<-data.frame(sample1=runif(1000),sample2=runif(1000),sample3=runif(1000))
#Perform k-means clustering
knum=5 #Set desired number of clusters
kres=kcca(myData,k=knum, family=kccaFamily("kmeans", dist="Euclidian", cent="mean"))
cluster_assignments=kres@cluster
#Calculate distance matrix and then perform MDS/PCA
d=dist(myData, method = "euclidean") # euclidean distances between the rows
fit=cmdscale(d,eig=TRUE, k=2) # k is the number of dim
#plot solution
plot(x=fit$points[,1], y=fit$points[,2], xlab="Coordinate 1", ylab="Coordinate 2", main="MDS", type="n")
colors=rainbow(knum)[kres@cluster]
points(x=fit$points[,1], y=fit$points[,2], cex=.7, col=colors, pch=20)
You can also look at this blog and what they call a clustergram to asses the clusters found
ggplot2 package in R has very nice ways to show clusters, by plotting mean/median as lines and sd or quantiles as shades. You probably will find sample codes to do that in the manual or website. http://had.co.nz/ggplot2/
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What kind of data is it? How many dimensions?
Yep, how many label type you have?
it is expression data...say it as 15 samples and 10,000 genes. I clustered the data first using hierarchical clustering and got 300 clusters. Then I did the kmean clustering, giving no of clusters 300. When I use the plot function, it does not plot anything. I am new to R, Please help.
So, you want to plot your 10,000 genes each as a point and have them visually clustered together or colored according to which of 300 clusters they belong to? I'm not clear exactly what you want. If you want a scatterplot then you need to define x and y axes. Not clear what those would be given you have 15 samples.