Entering edit mode
7.1 years ago
lirongrossmann
▴
40
Hi everyone, I have a group of samples which is supposed to be biologically homogeneous. I want to cluster the genes to see which are highly expressed and which are lowly expressed across all samples. I tried hierarchal clustering but it got stuck because there are so many genes. I don’t want to use pca as I want to capture the genes that are uniformly expressed across the samples, not the ones which are most variable. Any suggestions on how to choose the genes to cluster for my purpose? Thanks
What's the data ? What do you mean with hierarchical clustering got stuck ?
The data is rna-seq. There were too many genes and the program is still running. It runs nicely with fewer genes (I have 30,000) Thanks
It should not take that much time. However, you can filter non variable genes and hope that reduces number.
What's the size of the data, the amount of RAM your computer has and the algorithm you use and its implementation ? I presume the data is a 30000 x p matrix. What's p ? Even for large p, this shouldn't take long unless your computer is underpowered (i.e. not enough RAM) and/or you use a bad/inefficient implementation of the algorithm.