Subgroup significance testing
1
0
Entering edit mode
8.8 years ago
pavenhuizen ▴ 90

Dear all,

I would like to ask your assistance with the following problem: I have a subgroup of genes which contain a certain motive and I would like to know if the presence of this motive significantly changes the expression of said genes. I have RNA-Seq data for a control (WT), and two over-expression and mutant lines.

So far I have came up with one approach, which I will outline below, but I'm uncertain if this is the correct approach and I would like to know if there are any other methods of finding the significance of my subgroup.

My planned approach is as follows:

  1. Obtain the significantly differentially expressed genes with edgeR, by comparing WT with the over-expression and mutant conditions.
  2. Divide the genes into three categories, based on the edgeR output. The categories are either +1, if a gene is significantly differentially expressed AND up-regulated, -1 if significantly differentially expressed AND down-regulated and 0 if not significant.
  3. Perform Chi-square analysis based on the categorized data, comparing the frequencies/percentages of the subgroup with the frequencies/percentages of all the genes (including those of the subgroup).
  4. Do bootstrapping analysis with replacement and get the one-sided p-value.

And that's about it. I don't have much experience in this kind of analysis and my statistics are not that strong, so please correct me if I made any mistakes or if you know of a better way of testing!

Thanks in advance for everyone taking the time to read this and to anyone who is willing/capable of helping me with my problem.

statistics RNA-Seq • 2.4k views
ADD COMMENT
0
Entering edit mode
8.8 years ago
Benn 8.3k

I would suggest a hyper geometric test, such as done for gene set enrichment analysis (with GO terms).

ADD COMMENT
0
Entering edit mode

Thank you for your quick response! Is it possible to do a hyper geometric test for two conditions, or should I do a two hyper geometric tests for each sample, one for only up-regulated genes and for down-regulated genes? Or alternatively just comparing the number of significantly differentially expressed genes against the others?

--- EDIT ---

I will try hyper geometric testing, but I was wondering if there is anything wrong with my proposed approach?

ADD REPLY
0
Entering edit mode

You can do both ways. But the general approach is comparing number of differentially expressed genes.

ADD REPLY
0
Entering edit mode

I don't see the point of using the chi-square test.

As you describe your experimental design, you want to know if your up or down regulated genes are significantly enriched with the motif (or up/down together). You can test these 3 cases separately with hyper geometric test.

So step 1 of your approach sounds good, then try the hyper geometric test. Good luck!

ADD REPLY
0
Entering edit mode

Thank you! I'm working on it now

ADD REPLY

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6