Identifying The Biological Relevance Of Candidate Genes
1
1
Entering edit mode
11.3 years ago
Sudeep ★ 1.7k

Hi all,

To identify the biological relevance of a set of candidate genes (from differentially expression analysis or clustering or any other method ) the first and foremost thing that comes into mind is to perform an "enrichment analysis", something along the lines of a hypergeometric test or fisher chi square test or a bit more sophisticated approaches like Gene Set Enrichment Analysis (GSEA). I have been searching around to see if somebody has worked on alternative approach to give a "biological meaning" to a set of candidate genes, but with very little success. Have you come across any alternative approaches ? what are your thoughts on this ?

biology enrichment • 2.6k views
ADD COMMENT
3
Entering edit mode
11.3 years ago

I recently wrote a blog post about enrichment analysis: http://blog.nextgenetics.net/?e=94

I went over how a simple hypergeometric distribution works in relation to enrichment analysis. My problem with these methods is that the statistical combinatorial space is not representative of the possible biological combinatorial space.

Most of these enrichment assays works on some kind of: "if I was to randomly pick X items out of this population what is the chance that more than Y items of a certain type will be present in X". A more biological example: "If I was to randomly designate 100 genes to be differentially expressed in my transcriptome, what is the chance that more than 50 genes related to 'cell cycle' will be present in the 100 genes".

The statistical p-value is essentially calculated by: "out of all combinations of X items in the population, how many of those combinations have more than Y in X".

However, is using the combinatorial space of "all combinations of X items in the population" valid? How much of that combinatorial space is actually biologically relevant? Can all possible combinations actually happen in nature? Wouldn't a specific combination kill the organism or will be almost impossible to induce? If we reduce the combinatorial space by looking at only biologically possible cases, our p-value will increase. The question, to me, is how much of the statistical combinatorial space is biologically relevant? If most of it are, then our p-values are good. If not, then our p-values are too low.

ADD COMMENT
0
Entering edit mode

"My problem with these methods is that the statistical combinatorial space is not representative of the possible biological combinatorial space." I completely agree with you here.

Isn't reducing the combinatorial space as you said more suited to a wet lab experiment ? may be some sort of functional genomics assay to understand the function of a gene ? I have a feeling that enrichment analysis tests like hypergeometric tests are highly subjective, and quite often enrichment results for the same set of candidate gene can be made "more biologically relevant" or "less biologically relevant" just by playing around with the number of universal genes selected (total number of white + black balls in the urn).

ADD REPLY

Login before adding your answer.

Traffic: 3217 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6