Question

Methods for contrasting gene expression profiles between multiple groups

1

Entering edit mode

5.5 years ago

orzech_mag ▴ 230

Dear Colleges,

I've got rna-seq expression data in two subtypes of cancer divided into two smaller groups each (finally I am having 4 groups to compare). I would like to compare all 4 groups at once to see gene profiles that are common and different between all these groups. I'd like to ask you what would be suggested method. My data is large, as it has 20k genes. I've already tried different variants of hierarchical clustering, but I get the whole picture of all 20k. There are visible patterns, but not clearly separated and I would need to filter the most differentiating genes manually. Is there any other option to contrast all these 4 groups at once and filter out genes that differentiate them well?

I'll appreciate your help and advices very much. Thank you in advance.

RNA-Seq Expression profiling • 2.5k views

ADD COMMENT • link updated 5.5 years ago by Kevin Blighe 88k • written 5.5 years ago by orzech_mag ▴ 230

score 2 · Answer 1 · 2019-06-03

2

Entering edit mode

5.5 years ago

Kevin Blighe 88k

Hey,

It seems like you need to apply ANOVA.

If you have raw counts, e.g., from RNA-seq, then process them in DESeq2 and follow the guidelines for Likelihood Ratio Test, which is akin in ANOVA.

If you have microarray data (already normalised and transformed), or any other type of expression data that has already been normalised and transformed, then use standard tests. In R:

aov() - ANOVA ( http://www.sthda.com/english/wiki/one-way-anova-test-in-r )
kruskal.test() - non-parametric ANOVA ( http://www.sthda.com/english/wiki/kruskal-wallis-test-in-r )

Use Kruskal-Wallis non-parametric test if your sample n is low and/or your data distribution drifts from the 'bell curve'.

You can also do post-hoc non-parametric pairwise comparisons between your groups with Dunn's test, as I show here: A: Network/Pathway Analysis from Mass Spec data

One you identify statistically significant genes, filter your data matrix for these, and then re-generate your heatmap.

Kevin

ADD COMMENT • link 5.5 years ago by Kevin Blighe 88k

0

Entering edit mode

Please correct me if I am wrong, but I don't feel like ANOVA will solve my issue. Now I realized that I didn't specified all significant details of the data. So, I have big cohort study: 800 patients divided into 4 groups (2 cancer subtypes divided into 2 smaller groups), each patient had sequenced and processed (normalized) expression profiles of 20k genes. When I read the description of ANOVA you provided I couldn't find the way to A) analyze all 20k genes in all 800 patients divided by disease type factor at once, and B) base on the results filter genes that are common/distinct between all 4 groups and get them by name.

ADD REPLY • link 5.5 years ago by orzech_mag ▴ 230

0

Entering edit mode

Indeed, the methods that I proposed can be used to test each gene independently across your groups. Once each gene is tested, you would still have an understanding of genes that are different across your groups.

Alternatively, you can cluster all samples and genes together and then identify clusters in your data via various metrics, including

ConsensusClustering
Gap statistic
Elbow method
Silhouette Method
M3C

...or you can just 'cut' the dendrogram tree with cutree() function.

Another idea would be to perform lasso-penalised regression, which would allow you to analyse all genes together, and across all samples. RandomForest® is another idea.

Another idea is to building correlation networks.

It depends on what, exactly, you are hoping to achieve.

ADD REPLY • link 5.5 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you Kevin. It seems like there is wide range of options, now I need to define, which one will be appropriate for me.

ADD REPLY • link 5.5 years ago by orzech_mag ▴ 230

0

Entering edit mode

Thanks @Kevin for your nice explanation. I appreciate if you look at my question here

ADD REPLY • link 4.2 years ago by Raheleh ▴ 260