How to combine multiple genes expression for survival analysis ?
1
0
Entering edit mode
8.2 years ago
gokce.ouz ▴ 70

Hi All,

We have combined 4 GEO datasets, removed batch effect using ComBAT, and extracted the genes we are interested in. Each gene has multiple probes. However, 1 cohort missing the 3 probes( but they have different probes for those genes). We would like to do survival analysis by combining multiple genes. Our aim is to compare survival of high expressed vs low expressed.However, we have multiple questions:

Genes   A1  A2  A3  B1  B2  B3  C1  C2  C3
Batch1  NA  6.1 7.6 5.0 4.4 NA  6.4 6.4 NA
Batch2  5.9 5.9 8.3 5.2 5.1 5.1 6.7 6.3 6.3
Batch3  6.4 6.4 8.2 5.1 5.3 5.3 6.7 6.7 6.7
Batch4  5.6 7.1 6.3 6.3 8.1 6.5 5.4 6.0 4.9
  1. Should we combine probes of same genes? If yes, which way do you suggest : Average, median, or others ?
  2. When we are combining the probes what should we do the missing probes? Should we totally exclude A1, B3 &C3 from our analysis or for Batch 2,3,4 : combine A1,A2.A3 & for Batch 1: combine A2, A3 ?
  3. After combining the probes, we would like to see the 3 combined genes effect on survival so to get their combined expression is it ok to use Avg(A,B,C) +1/2 SD ? or what do you suggest ?
  4. As a next step, how should we define the threshold for high/ low expression ? Is using Z score on the combined 3 gene expression is ok to set the threshold? 0 will be the base & negative values defines low expression, whereas high values define high expression of the combined genes?

Thanks in advance,

Gokce

Microarray Survival Analysis • 3.2k views
ADD COMMENT
0
Entering edit mode
7.1 years ago

Hey Gokce,

There are no real answers for your questions because there are no standards set in relation to what you are asking.

I presume that this is microarray data and that you have normalised it by RMA or gcRMA.

1) Should we combine probes of same genes? If yes, which way do you suggest : Average, median, or others?

There is no answer. Some people will favour getting the average/mean, whilst others prefer the median. Both have their own advantages and disadvantages, and are both open to criticism. In this situation, I don't see any problem in obtaining the mean. My logic is that, considering you will have already performed normalisation, highly variable probes will have already been managed and possibly excluded during normalisation.

2) When we are combining the probes what should we do the missing probes? Should we totally exclude A1, B3 &C3 from our analysis or for Batch 2,3,4 : combine A1,A2.A3 & for Batch 1: combine A2, A3?

If only 1 of 3 values is missing, then just get the mean of the 2 probes for which you do have values.

3) After combining the probes, we would like to see the 3 combined genes effect on survival so to get their combined expression is it ok to use Avg(A,B,C) +1/2 SD ? or what do you suggest?

I'm not understanding your question, particularly why you would want to add 1/2 SD?

4) As a next step, how should we define the threshold for high/ low expression ? Is using Z score on the combined 3 gene expression is ok to set the threshold? 0 will be the base & negative values defines low expression, whereas high values define high expression of the combined genes?

I would start by dividing the expression range for each gene into tertiles, as follows:

  • lower tertile = low expression
  • middle tertile = normal expression
  • upper tertile = higher expression

Thus, you will have 3 lines in your survival curve.

My main worry for your data actually relates to the mention of having correct for batch. How can you be sure that you have adequately corrected for this? Batch correction is an area that comes under question time and time again, and one must be sure that one i not just introducing further bias/confounding information in the attempt to correct for batch. Have you done a PCA analysis to gauge the correction?

Hope that this helps

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2408 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6