How To Determine If A Gene Is Active From Expression Data
3
2
Entering edit mode
14.7 years ago
Allpowerde ★ 1.3k

I have RMA (Robust Multi-Array) scores for the different genes (and their isoforms) on the Affymetrix chip. I want to know which of these genes are "active" (or in other words: are likely to produce enough protein products to have an effect). I'm not interested in them being differentially expressed or X-fold over- or under-expressed. All I want is the classification of them being likely "on" or "off".

So far I log-transformed (basis 10) the RMA score and centered them (subtracted the median). I called all genes which had a transformed score <0 as being inactive and scores >0 as being active.

Does anyone have a better methodology ?

gene • 7.7k views
ADD COMMENT
0
Entering edit mode

I think it would help to elaborate on what the "produce enough protein to have an effect" means.

ADD REPLY
0
Entering edit mode

Sorry, I was to vague here. I am looking at the effects of a certain set of transcription factors in a certain tissue. There seem to be some interesting patterns of co-operation between them. Whether these TFs are able to interact in the first place depends on whether all of them are actually expressed in this tissue. That's what I want to find out with this exercise. -- Thanks for your help !

ADD REPLY
7
Entering edit mode
14.7 years ago
Nicojo ★ 1.1k

I would suggest the following question instead of the one you're asking:

Can you actually determine if a gene is "active" (i.e. translated into protein) from [gene] expression data?

And I'll point you towards people who have published papers about it:

These are just a few papers that seem critical towards such a correlation. That is not to say that there is no good correlation for any gene. But I would be very surprised if you can make a general rule about it without checking in every cell type, tissue type and for every gene to see if such a correlation is or not acceptable.

Now, if you do a Pubmed search for the terms "correlation mRNA protein", you will find many papers that check for such correlations, but mostly for specific genes in specific tissues (often for cancer diagnostics purposes).

If you do find papers that state such correlations, genome wide using microarray data, I'd be highly suspicious of that paper.

So, obviously, you can not set "a" cut-off for determining this. My personal experience tells me that you can have gene transcription with no protein expression following it... Unfortunately, I have not published it yet :(

ADD COMMENT
0
Entering edit mode

Thanks for this detailed reply! Those are really great references you pointed me to !

ADD REPLY
0
Entering edit mode

Thanks for this detailed reply! Those are really great references you pointed me to! However, determining how much proteins are actually produced from the transcribed mRNA is going into too much detail for this project.

ADD REPLY
0
Entering edit mode

The point is that no amount of mRNA will tell you if the protein is present and in what amount... And even less if there is a biological impact by the proteins produced.

ADD REPLY
0
Entering edit mode

I agree, theis statement in my question was quite confusing. What I'm after is just a rough classification for the proteins in "probably there" or not. (I'm looking forward to reading the publication you hinted at)

ADD REPLY
0
Entering edit mode

Ahh I'm struggling with quite a few things more urgent. I'm afraid it might stay on the shelf for a while (I hope not forever though)... Wet lab can be EXTREMELY frustrating :(

ADD REPLY
4
Entering edit mode
14.7 years ago

You're right in thinking that your methodology isn't a very good representation of the system. mRNAs (and their protein products) have a huge dynamic range. Some are going to be expressed constantly at extremely low levels, and at the other extremes, you'll have genes that are highly expressed, but only for a short period of time. Taking the median level as the dividing line between on and off is going to give you huge numbers of false negatives (genes that are actually being transcribed and translated, but that you'll classify as "off")

I'd look at what the background noise level is, then run some stats to determine which probes give you signal significantly above that level. Any gene meeting that criteria should probably be considered "on". I suspect that may not divide the set as nicely as you'd hope, though.

Maybe if you tell us more about what exactly you're trying to do, we can offer more constructive advice.

ADD COMMENT
0
Entering edit mode

I agree, using the background level as noise and using that as not-expressed at all sounds like a good approach

ADD REPLY
0
Entering edit mode

That sounds like the approach I'm after. How would I determine the noise level thought?

ADD REPLY
0
Entering edit mode
14.7 years ago
Will 4.6k

Sounds like your trying to find genes which actually switch from on-to-off (or vice-versa) based on cell-type, condition, etc. Not all genes have this type of behavior ... some are graded (like a dimer switch). There are numerous papers that discuss techniques for finding genes which have "bi-modal" expression patterns. Since they are a mixture of two expressions patterns it is likely that they have "on" and "off" pattern.

This article explain the technique and includes Matlab code that should do the whole thing for you.

Human and mouse switch-like genes share common transcriptional regulatory mechanisms for bimodality

ADD COMMENT

Login before adding your answer.

Traffic: 2593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6