Dear Colleges,
I've got rna-seq expression data in two subtypes of cancer divided into two smaller groups each (finally I am having 4 groups to compare). I would like to compare all 4 groups at once to see gene profiles that are common and different between all these groups. I'd like to ask you what would be suggested method. My data is large, as it has 20k genes. I've already tried different variants of hierarchical clustering, but I get the whole picture of all 20k. There are visible patterns, but not clearly separated and I would need to filter the most differentiating genes manually. Is there any other option to contrast all these 4 groups at once and filter out genes that differentiate them well?
I'll appreciate your help and advices very much. Thank you in advance.
Please correct me if I am wrong, but I don't feel like ANOVA will solve my issue. Now I realized that I didn't specified all significant details of the data. So, I have big cohort study: 800 patients divided into 4 groups (2 cancer subtypes divided into 2 smaller groups), each patient had sequenced and processed (normalized) expression profiles of 20k genes. When I read the description of ANOVA you provided I couldn't find the way to A) analyze all 20k genes in all 800 patients divided by disease type factor at once, and B) base on the results filter genes that are common/distinct between all 4 groups and get them by name.
Indeed, the methods that I proposed can be used to test each gene independently across your groups. Once each gene is tested, you would still have an understanding of genes that are different across your groups.
Alternatively, you can cluster all samples and genes together and then identify clusters in your data via various metrics, including
...or you can just 'cut' the dendrogram tree with
cutree()
function.Another idea would be to perform lasso-penalised regression, which would allow you to analyse all genes together, and across all samples. RandomForest® is another idea.
Another idea is to building correlation networks.
It depends on what, exactly, you are hoping to achieve.
Thank you Kevin. It seems like there is wide range of options, now I need to define, which one will be appropriate for me.
Thanks @Kevin for your nice explanation. I appreciate if you look at my question here