I exposed mouse cells to a control and a stimulus and then used RNA-seq to get a list of genes that are up or down after exposure to the stimulus.Out of the 20 thousand something mouse genes, I have around 400 whose expression is different.
I'm not too sure if this is possible, its a bit of a shot in the dark, but is there a way to say like how likely it is to see these number of genes occur by chance or someway and some stats to say that I have 400 genes and these genes are enriched and unlikely to just occur randomly?
Thanks everyone for the advice. I'm not very knowledgeable about all this, and most of the analysis was already done by the company that did the RNA-seq so I was just looking at an Excel spreadsheet that the company provided. My original plan was to just look at pathways in common, GO terms etc, but my supervisors were advising me to look at some sort of "stats".
Thanks for pointing me in the right direction. I've had a look and I think it might be a bit out of my meager abilities and I'll have to consult a bioinformatician.
The p-value given in DE analysis is not a good indication of the magnitude of difference between your samples. Given enough replicates, two samples with tag counts of 1000 and 999 can be determined to be differentially expressed (which they technically are) with a good p-value.
What do you consider different in your samples? 2X fold change? 3X?
Also think about how this translates to the phenotype because ultimately, that's what you care about. Some genes might need just a small amount of modulation in expression to produce a phenotype (transcription factors that results in some kind of down-stream amplification cascade), some genes might need a large change in expression to cause a phenotype.
And there are statistical tests you can try to use. Hypergeometric test is probably the most basic one to test for enrichment.
Yes indeed hypergeometric test will be the basic one for your enrichment but it is important to see how you find this 400 genes that are behaving differently. Did you do any DE analysis with any widely available tools?
If you have replicates then proceed as @Goutham Atla pointed out. But it is important to select a tool that doing the DE analysis for you. Keep in mind there will be variability in between samples as well, so a widely used tool for DE analysis is needed to efficiently list out the differential expression. Once it is done you will see that the DE genes are associated with p-values and then you can use multiple-testing on those p-values to score the significance over a population. Then you can take the genes that have adj.p-value less than some threshold (roughly 0.05 or 0.01) and farther filter for fold change. Keep in mind what kind of fold change you will consider be it 2X or 3X as Damian pointed and then try to see if there is any GO enrichment of these genes or any specific pathway that are triggered due to up/down regulation that translates to the phenotypic condition.
There are multiple RNA-Seq analysis tools that you can use for DE analysis . You can take a look at this link.
Hope this helps.
I would anyway use any one of the widely used tools to do the same.
Few things to check:
Thanks everyone for the advice. I'm not very knowledgeable about all this, and most of the analysis was already done by the company that did the RNA-seq so I was just looking at an Excel spreadsheet that the company provided. My original plan was to just look at pathways in common, GO terms etc, but my supervisors were advising me to look at some sort of "stats".
Thanks for pointing me in the right direction. I've had a look and I think it might be a bit out of my meager abilities and I'll have to consult a bioinformatician.