Genes for clustering

0

Entering edit mode

7.1 years ago

lirongrossmann ▴ 40

Hi everyone, I have a group of samples which is supposed to be biologically homogeneous. I want to cluster the genes to see which are highly expressed and which are lowly expressed across all samples. I tried hierarchal clustering but it got stuck because there are so many genes. I don’t want to use pca as I want to capture the genes that are uniformly expressed across the samples, not the ones which are most variable. Any suggestions on how to choose the genes to cluster for my purpose? Thanks

clustering hierarchical clustering • 1.8k views

ADD COMMENT • link 7.1 years ago by lirongrossmann ▴ 40

1

Entering edit mode

What's the data ? What do you mean with hierarchical clustering got stuck ?

ADD REPLY • link 7.1 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

The data is rna-seq. There were too many genes and the program is still running. It runs nicely with fewer genes (I have 30,000) Thanks

ADD REPLY • link 7.1 years ago by lirongrossmann ▴ 40

0

Entering edit mode

It should not take that much time. However, you can filter non variable genes and hope that reduces number.

ADD REPLY • link 7.1 years ago by Puli Chandramouli Reddy ▴ 190

0

Entering edit mode

What's the size of the data, the amount of RAM your computer has and the algorithm you use and its implementation ? I presume the data is a 30000 x p matrix. What's p ? Even for large p, this shouldn't take long unless your computer is underpowered (i.e. not enough RAM) and/or you use a bad/inefficient implementation of the algorithm.

ADD REPLY • link 7.1 years ago by Jean-Karim Heriche 27k

Login before adding your answer.