Here I have two groups of genes. There are over 2000 genes in each group. Then I check the clinical outcome of each gene in these two groups by survival analysis using “survival” in R, and the equation for the model is coxph(Surv(time,censor) ~ exprs)
. Here time is survival time (for dead) or last follow up time (for alive). Censor is dead or alive for each sample. Exprs is gene expression value measured by RPKM.
Every model should give you P-value as well as coefficient for each gene. Previous study said that “the Cox model also provides a coefficient for each term, which is related to its contribution to the hazard ratio. A positive coefficient indicates that the gene increases the hazard ratio, while a negative coefficient indicates that expression of the gene is protective.”
I marked every genes in each group whose P-value <=0.05 as prognositc genes. Then I plotted the distribution of coefficients of prognositc genes in each group by boxplot
. I found that the coefficients from group1 is significant lower than group2 (wilcox.test
in R). I interpreted this result as there are more protective genes or less harmful genes in group1, since lower coefficient means more negative coefficient as well as smaller positive coefficients.
Is it meaningful to do this comparison? What does this result mean for you? Can you please tell me your interpretation?
Thanks.
Could you confirm how you defined your group1 and group2 individuals please
The definition of group1 and group2 is based on the specific genomic feature which is supported by many previous studies. In detail, I extract genes which are under control of two combinations of multiplt histone modifications and DNA methylation pattern into two groups.
It is meaningful. I am quite sure about it.
Performing a Wilcoxon test on the beta coefficients from the Cox model does not 'feel' right; however, If you are sure about it, then you should proceed with it. When you publish the work, reviewers will likely question it (if they are statistically minded). I would also report the difference in mean and median coefficient between the groups, and ensure that you confidence intervals are not large.