Question

Clustering Method For Coexpressed Genes

1

Entering edit mode

12.9 years ago

GR ▴ 400

Hi All,

Can any one suggest which is the best clustering algorithm to check the coexpression of genes. I used the K-mean clustering algorithm but I suspect it does not cluster correctly.

I have microarray data for total 10 samples from different conditions/tissues. When I cluster data from all the 10 samples then it gives different results than if I cluster data for just one sample. Means, in both cases same genes are clustered in different clusters. Some variation I can expect but results are entirely different in both the cases.

Please help I am new to this.

Thanks, Ritu

clustering gene • 4.2k views

ADD COMMENT • link updated 10.5 years ago by karl.stamm 4.1k • written 12.9 years ago by GR ▴ 400

1

Entering edit mode

@Ritu: how do you cluster one sample?

ADD REPLY • link 12.9 years ago by Steve Lianoglou 5.2k

0

Entering edit mode

You need to add a little more info to your question before we can reasonably answer. Things like: What program/package are you using to do the clustering? Which distance metric? What value of K are you using and how did you choose it? Is is 10 samples per condition/tissue or 10 samples total? How many replications per condition? Why do you believe the clustering is 'incorrect'?

Also, getting different results when running 10 samples and when running 1 sample is not very informative. Microarrays have lots of noise and when clustering based on one array you may just be clustering noise.

ADD REPLY • link 12.9 years ago by Will 4.6k

0

Entering edit mode

"When I cluster data from all the 10 samples then it gives different results than if I cluster data for just one sample." Totally overlooked this sentence. But that statement seems odd. I think you need to explain better what your question is.

ADD REPLY • link 12.9 years ago by Michael 55k

score 7 · Answer 1 · 2012-01-12

This has been said very often:

There is no general best clustering algorithm
cluster analysis is an exploratory technique, thus the best algorithm for your data would be the one that helps you make a novel discovery that leads to an interesting hypothesis.
Therefore, you have to try out many different supervised and unsupervised methods (k-means, hierarchical clustering( there are many different distance measures and inter-cluster distance measures in addition), fuzzy clustering, model-based, PCA, ICA, LDA, QDA,...).
The outcome of k-means is non-deterministic, and depends on your initial centroid vectors. You have to run this algorithm multiple times.
Keep a close eye on the biological background and question, your analysis must make sense in that respect.

score 1 · Answer 2 · 2014-05-16

1

Entering edit mode

10.5 years ago

Ares Cao ▴ 20

various clustering method need to be used to see if you could see some obvious pattern, I think

ADD COMMENT • link 10.5 years ago by Ares Cao ▴ 20

Ram · Answer 3 · 2014-05-16

Hi there,

Some algorithms perform better than the others, but in general there is no clear overall winner for all datasets. I would suggest you to look on some comparative analysis papers, like:

Clustering cancer gene expression data: a comparative study

Also consider that the distance measure you apply along with the clustering algorithm may impact the quality of your results, see, for instance:

On the selection of appropriate distances for gene expression data clustering

Hope it helps,

Cheers!

Ram · Answer 4 · 2014-05-17

It's not clear what you want to do with the data. Of course it will look different in different ways. These are very high dimensional data, so there are many ways to make them a two dimensional figure. The PCA plot is one view that captures maximum variation, but you could look at any two dimensions to make a scatter-plot to k-means over.

For ready made tools take a look through the available algorithms in Bioconductor and search for your particular microarray chip to see if there's gene annotation aligned with the probes you have.