Here's the scenario.
I have a list of GO terms representative of an upregulated gene set.
I am using the number of occurances of a unique GO in the upregulated gene set as my test set, the number of total upregulated genes as my test population. My reference set is the occurances of that unique GO expressed in the transcriptome and the reference population as the number of genes expressed in the transcriptome. Can people confirm this is the way to go?
Second question: Now I have been combining all Biological Processes (BP), Molecular Function (MF) and Cellular Compartment (CC) GO identifiers in the past and running my multiple correction (BH FDR 0.05) based on 3 categories combined....however I'm having doubts, I've read that some people separate each classification out and do each of the 3 classifications separate. What's the correct way? My way or seperating them all into 3 seperate groups for the test?
My approach is using the R fishers single tailed function with BH multiple correction.
Can someone please provide confirmation or useful links? There are similar posts on here, but I feel they aren't quite addressing my questions specifically..
This is a non-model organism I've annotated.
Many thanks!