Question

What is the best test to compare the expression of two different sets of genes in the same transcriptome?

1

Entering edit mode

7.9 years ago

Pas ▴ 30

Hello, I have two sets of different genes I have identified from my previous analysis. Let's call these sets "A" and "B". A contains 10 genes "g1","g2", "g3", "g4".."g10", B contains 14 genes "g11", "g12"..."g24". I want to compare in each sample (transcriptome) the distribution of these two sets of genes A vs B. These sets of genes are predictive for survival. I know that when A is strongly expressed with respect B the patient has a bad survival. I thought to use the Kolmogorov-Smirnov test (Ks,test) to compare the distributions of A vs B. It works very well...all the patients whose pvalue is significant show a different survival. Do you think that the Ks.test is statistically correct? Do you recommend other methods to classify each single patient based on these two sets of genes? any other suggestions is more than welcome. Thank you

RNA-Seq R gene • 2.7k views

ADD COMMENT • link updated 5.8 years ago by ritarebollo ▴ 70 • written 7.9 years ago by Pas ▴ 30

score 1 · Answer 1 · 2017-09-23

1

Entering edit mode

7.9 years ago

Jean-Karim Heriche 27k

What you're doing is not entirely clear. Do you use the KS test to assess whether the 10 values of set A and the 14 values of set B come from the same distribution ? If so, I think the KS test is inappropriate here because set A and set B are not mutually independent (they are genes measured in the same sample). In this case, a permutation test would seem more appropriate.
However, if the goal is to classify the samples/patients, you could try various machine learning approaches using the vectors of 24 gene values as input data. If you have training data (i.e. vectors with ground truth label), then build a classifier. If you don't have or do not want to use training data then try clustering. Which particular method/algorithm to use is up to you but could depend on details you haven't given.

ADD COMMENT • link 7.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thank you Jean Karim. I am aware of the assumptions underlying the Ks test, this why posted here. the data set is small, so that I can't use any ML approach.Do you know an alternative test to the Ks test that doesn't assume independency? ..in other words: if you have 1 sample ..only 1 sample where you want to compare 2 set of genes, which test do you suggest?

ADD REPLY • link 7.9 years ago by Pas ▴ 30

0

Entering edit mode

You can still use ML approaches when the data set is small. It depends on how small is small. For example, if you want to associate a probability to two classes (e.g. good/bad prognosis), you could try logistic regression. If you still want to do a statistical test for some difference between set A and set B, go for a permutation test. You could go with the KS statistics if it works well for you, only compute the p-value using permutations.

ADD REPLY • link 7.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thank you Jean Karim!

ADD REPLY • link 7.9 years ago by Pas ▴ 30

score 0 · Answer 2 · 2019-10-30

0

Entering edit mode

5.8 years ago

ritarebollo ▴ 70

Hello, I am highjacking this post (sorry!). I am also comparing two gene sets in a same transcriptome. But in my case, I have 150 genes in set A and all the other genes in set B (10000 genes or so...). I would like to see if genes from set A are highly expressed compared to the rest of the genes. I'm not sure I can actually do this... I was thinking of making random lists from set B with the same size as set A (so roughly 150 genes). I also thought to remove all genes that had 0 counts... Anyone has an idea on this? Any tool that might exist that I somehow missed? Thank you very much! R

ADD COMMENT • link 5.8 years ago by ritarebollo ▴ 70

1

Entering edit mode

If the answer to the original question doesn't apply to your case, you should create another question. Anyway, you need to give more information on the data you have. If you have expression values, one of the thing you can start with is look at the histograms of values for A and B. An obvious difference will be more convincing than a statistical test.

ADD REPLY • link 5.8 years ago by Jean-Karim Heriche 27k