Entering edit mode
10.9 years ago
Adrian Pelin
★
2.6k
Hello,
So I have 2 lists of variants that affect different genes. If I do enrichment with KOG, I get basically the gene categories of the genes that are affected by the 2 variant lists.
Is there anyway to compare if the 2 variant lists affect different gene categories?
Adrian
That sounds great, thank you for your answer. Any advice on tools?
EDIT: Would this approach in R work?
fisher.test(Convictions, alternative = "less")
Yes, that is the right strategy. You'll also need to use apply or a for loop to cycle through your categories (unless you are only interested in a small number)
Well, if I have column 1 being list of genes 1 and column 2 being list of genes 2, and than the different rows being the different functional categories of genes, than 1 test should be enough right?
I currently have these categoriez being screened for:
It depends upon your question.
If you were asking about the different distributions, then yes. However, I don't think this is what you want. For example, you would get a significant result if your gene list was simply twice as large. LIkewise, the groups are probably not independent: I would bet "Chromatin structure and dynamics" has a lot of overlap with "transcription".
For example, lets say the results above are for list #1. You also need to know the total number of genes used for analysis (let's say that was 1000). For "RNA processing and modification", the counts for list # 1 would be 104 and 986 (1000 - 104). You would then need to know the total number of genes used for the analysis (let's call this X) in list #2 and the number genes used for functional enrichment analysis for list #2 (let's call this Y).
The comparison for this category is 104-to-896 versus X-to-(Y-X). You should do this for all classes A-Z, unless you know that only a couple of them were showed statistically significant enrichment for either list #1 or list #2. I would say a different proportion in list #1 versus list #2 doesn't matter if neither varies from the background frequency.
What if I develop a script, that rather than counting number of genes falling in each category of processes, it would count the number of Unique KOGs found in those genes.That way, it's fair game, because we are looking at number of unique KOGs.
Some KOGs fall into multiple classes, so I would discard those.