Hi guys,
I have a big data.frame of RNA-Seq counts in which rows are genes while columns are samples.
I clustered this big matrix and I identified 6 major clusters. They have some common genes, i.e. genes that do not show a huge variation between the samples (around 100 patients) and some genes that characterize each cluster because the expression is different between the clusters. For example: in one cluster 10 genes are highly expressed while in all the other clusters the same genes are poorly expressed and do not change substantially comparing to the first cluster. Is there a way to select the highly "significant" or variable genes that characterize each cluster with respect to the others in order to end up with a list of cluster-specific genes whose expression is peculiar of that cluster? I know that a way is to perform a log2 (fold change) but I would like to perform this analysis in an unsupervised way without to select the comparisons for the fold change calculation. Can anyone help me with some idea or references so that I can select the cluster-specific relevant genes?
Thank you in advance
e.
Thank you Kevin for your answer. I simply normalized my data and then performed an unsupervised HCA with Pearson correlation as a measure of distance. I have no reference samples. Then the clusters appeared. I appreciate a lot your work. I think I could be inspired by it. Thank you a lot.
Grazie - prego / You're welcome.