Hi all,
I have two kinds of gene list. One describes the status(Active, Bivalent, Repressive and Quiescent) of gene, i.e.:
Gene name Status
A Active
B Bivalent
C Repressive
... ...
The other two describes predicted tumor suppressor gene(TSG) and oncogene(OG), i.e.:
TSG
Gene name Score
A 0.0001
B 100
C 1
... ...
OG
Gene name Score
A 0.001
B 1
C 10
... ...
Then I associated first gene lists to the two other gene lists, respectively,to see whether the genes in the first list are TSG or OG(regardless of the score). I can get a table like this(the overlap is quite limited):
We can see that for the genes in the first list, there are more tumor suppressor genes than oncogenes(13>8). If I want to test whether repressive genes are indeed more associated with tumor suppressor genes compared to oncogenes, how I can add statistical test?
The lines in the first list are 1769 in total (exclude header); lines in the TSG list are 491 ; lines in the OG list are 501.
You need the background frequencies in addition, like how many tumor supressor genes and oncogenes are there in total in your genome? Then your problem reduces to the following urn-lottery: Set up a lottery: you put G balls into the basket, N labelled Tum. and M labelled Onc. Now you draw J < G balls from your lottery without putting them back. What is the probability of having n>=7 labelled Tum. and m>=6 labelled Onc. in your sample.
Thanks for reply Michael!!! I've edited the question. It may be a bit different from the previous one. Actually I use the second type of lists(TSG/OG) to annotate the first list(result list of my analysis).
You have 2 categorical lists (Quiescent, Repressive, Bivalent and Active) and (Tumor Suppressor Genes, Oncogenes) and want to compare which factor has more relevance? Then test it with McNemar. See this page for more help.
No, the second list contains three factors: Tumor Suppressors, Oncogenes, and All other genes.
What are the scores in your second file?
The score in second is the predicted score from the list(predicted from a large data set of mutation signature). The higher the score, the more likely that the gene would be TSG or OG. But in my case, I would be more interested in finding which gene in my gene list are predicted as TSG or OG.
This question is incomplete. Are you asking if repressive genes are more associated with tumor suppressors compared than any other gene, or compared to oncogenes? I would do a fisher test or a regression, but first you need to define what you are looking for.
I want to see whether repressive genes are more associated with tumor suppressors compared to oncogenes. Thanks for pointing out the incomplete part!
thanks, but notice that in this way the "all genes" dataset is not taken into consideration.