Dear all,
I would like to ask your assistance with the following problem: I have a subgroup of genes which contain a certain motive and I would like to know if the presence of this motive significantly changes the expression of said genes. I have RNA-Seq data for a control (WT), and two over-expression and mutant lines.
So far I have came up with one approach, which I will outline below, but I'm uncertain if this is the correct approach and I would like to know if there are any other methods of finding the significance of my subgroup.
My planned approach is as follows:
- Obtain the significantly differentially expressed genes with edgeR, by comparing WT with the over-expression and mutant conditions.
- Divide the genes into three categories, based on the edgeR output. The categories are either +1, if a gene is significantly differentially expressed AND up-regulated, -1 if significantly differentially expressed AND down-regulated and 0 if not significant.
- Perform Chi-square analysis based on the categorized data, comparing the frequencies/percentages of the subgroup with the frequencies/percentages of all the genes (including those of the subgroup).
- Do bootstrapping analysis with replacement and get the one-sided p-value.
And that's about it. I don't have much experience in this kind of analysis and my statistics are not that strong, so please correct me if I made any mistakes or if you know of a better way of testing!
Thanks in advance for everyone taking the time to read this and to anyone who is willing/capable of helping me with my problem.
Thank you for your quick response! Is it possible to do a hyper geometric test for two conditions, or should I do a two hyper geometric tests for each sample, one for only up-regulated genes and for down-regulated genes? Or alternatively just comparing the number of significantly differentially expressed genes against the others?
--- EDIT ---
I will try hyper geometric testing, but I was wondering if there is anything wrong with my proposed approach?
You can do both ways. But the general approach is comparing number of differentially expressed genes.
I don't see the point of using the chi-square test.
As you describe your experimental design, you want to know if your up or down regulated genes are significantly enriched with the motif (or up/down together). You can test these 3 cases separately with hyper geometric test.
So step 1 of your approach sounds good, then try the hyper geometric test. Good luck!
Thank you! I'm working on it now