I've been working on gene mutation survival analysis, the data downloaded&merged from TCGA somatic mutation file (MAF) is:
SRCAP ZFHX4 AMER1 PCDHB8 AHNAK2 ...
are genes selected by the univariate KM survival& log-rank test, by dividing patient to Wt and Mutate group based on gene mutate status and then order the p-values, choose p=0.05 as the threshold.
Now I need to take account of all clinical features into the analysis along with these genes:
Surv(futime, fustat)~ gender+age+project+subtype+race_group+stage_group+SRCAP+ZFHX4+AMER1+PCDHB8+AHNAK2+DNAH5+NALCN+PAPPA+PCDH17+RELN+UGGT2+HYDIN
and the result:
coef exp(coef) se(coef) robust se z Pr(>|z|)
genderMALE 9.020e-01 2.465e+00 3.819e-01 3.696e-01 2.441 0.014659 *
subtypeMissing 4.793e-01 1.615e+00 8.825e-01 1.045e+00 0.459 0.646364
subtypeMucinous 1.354e+00 3.874e+00 5.972e-01 6.053e-01 2.238 0.025250 *
race_groupWhite -6.223e-01 5.367e-01 3.921e-01 3.903e-01 -1.594 0.110878
SRCAPWT -1.233e+00 2.914e-01 5.177e-01 6.516e-01 -1.892 0.058474 .
ZFHX4WT -1.577e+00 2.065e-01 4.996e-01 5.621e-01 -2.806 0.005014 **
AMER1WT -2.932e+00 5.332e-02 6.121e-01 5.547e-01 -5.285 1.26e-07 ***
AHNAK2WT 2.190e+00 8.932e+00 1.063e+00 9.183e-01 2.385 0.017097 *
DNAH5WT 2.011e+00 7.474e+00 7.732e-01 6.077e-01 3.310 0.000932 ***
NALCNWT -8.528e-01 4.262e-01 4.790e-01 4.151e-01 -2.055 0.039905 *
RELNWT 2.063e+01 9.155e+08 5.425e+03 1.659e+00 12.435 < 2e-16 ***
UGGT2WT -2.783e+00 6.185e-02 7.052e-01 5.688e-01 -4.893 9.95e-07 ***
HYDINWT 1.864e+00 6.450e+00 7.435e-01 7.284e-01 2.559 0.010499 *
I'm not convinced about the whole procedure and the result, how the "Stage" factor is not important to survival chance? besides, some gene's hazard ratio is incredible high(RELNWT :9.155e+08 ) . not sure if the reason is the sparse & binary feature of mutation data.
what's is the proper way to preform survival analysis based on mutation data? really need an explanation....thanks.