Clustering Method For Coexpressed Genes
4
1
Entering edit mode
12.9 years ago
GR ▴ 400

Hi All,

Can any one suggest which is the best clustering algorithm to check the coexpression of genes. I used the K-mean clustering algorithm but I suspect it does not cluster correctly.

I have microarray data for total 10 samples from different conditions/tissues. When I cluster data from all the 10 samples then it gives different results than if I cluster data for just one sample. Means, in both cases same genes are clustered in different clusters. Some variation I can expect but results are entirely different in both the cases.

Please help I am new to this.

Thanks, Ritu

clustering gene • 4.2k views
ADD COMMENT
1
Entering edit mode

@Ritu: how do you cluster one sample?

ADD REPLY
0
Entering edit mode

You need to add a little more info to your question before we can reasonably answer. Things like: What program/package are you using to do the clustering? Which distance metric? What value of K are you using and how did you choose it? Is is 10 samples per condition/tissue or 10 samples total? How many replications per condition? Why do you believe the clustering is 'incorrect'?

Also, getting different results when running 10 samples and when running 1 sample is not very informative. Microarrays have lots of noise and when clustering based on one array you may just be clustering noise.

ADD REPLY
0
Entering edit mode

"When I cluster data from all the 10 samples then it gives different results than if I cluster data for just one sample." Totally overlooked this sentence. But that statement seems odd. I think you need to explain better what your question is.

ADD REPLY
7
Entering edit mode
12.9 years ago
Michael 55k

This has been said very often:

  1. There is no general best clustering algorithm
  2. cluster analysis is an exploratory technique, thus the best algorithm for your data would be the one that helps you make a novel discovery that leads to an interesting hypothesis.
  3. Therefore, you have to try out many different supervised and unsupervised methods (k-means, hierarchical clustering( there are many different distance measures and inter-cluster distance measures in addition), fuzzy clustering, model-based, PCA, ICA, LDA, QDA,...).
  4. The outcome of k-means is non-deterministic, and depends on your initial centroid vectors. You have to run this algorithm multiple times.
  5. Keep a close eye on the biological background and question, your analysis must make sense in that respect.
ADD COMMENT
1
Entering edit mode
10.5 years ago
Ares Cao ▴ 20

various clustering method need to be used to see if you could see some obvious pattern, I think

ADD COMMENT
0
Entering edit mode
10.5 years ago
NetunoPoncã ▴ 160

Hi there,

Some algorithms perform better than the others, but in general there is no clear overall winner for all datasets. I would suggest you to look on some comparative analysis papers, like:

Clustering cancer gene expression data: a comparative study

Also consider that the distance measure you apply along with the clustering algorithm may impact the quality of your results, see, for instance:

On the selection of appropriate distances for gene expression data clustering

Hope it helps,

Cheers!

ADD COMMENT
0
Entering edit mode
10.5 years ago

It's not clear what you want to do with the data. Of course it will look different in different ways. These are very high dimensional data, so there are many ways to make them a two dimensional figure. The PCA plot is one view that captures maximum variation, but you could look at any two dimensions to make a scatter-plot to k-means over.

For ready made tools take a look through the available algorithms in Bioconductor and search for your particular microarray chip to see if there's gene annotation aligned with the probes you have.

ADD COMMENT

Login before adding your answer.

Traffic: 2995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6