Entering edit mode
5.8 years ago
Mike
★
1.9k
Hi all,
I have a list of genes (around 300 genes) and I want survival analysis to find only significant genes. I am using TCGA RSEM normalized data in survival package using following command, but I'm not sure how to choose significant genes.
coxph(Surv(OS_MONTHS, Events) ~., data=merged_data [, c(2, 4:303)])
output...
coef exp(coef) se(coef) z p
Gene1 0.07092 1.07349 0.05348 1.33 0.18
Gene2 0.02332 1.02360 0.05813 0.40 0.69
Gene3 0.00175 1.00175 0.06734 0.03 0.98
Gene n.....
Thanks
Hey Mike, usually, one takes the log rank p-value. The Hazard Ratio is specified by
exp(coef)
. Have you included all of your genes in the model? It may be better to test each independently and then, perhaps, construct a final model with just those genes that are statistically significant. I, of course, have a tutorial: Survival analysis with gene expressionThanks Kevin, as always your help and codes are really very helpful. Yes I included all genes. you can see my comand:
Is it correct? Sorry for again.. , what do you mean by test each independently and then construct a final model with selected genes? meantime I am installing your package RegParallel and will follow your protocol.
Okay. If you use Windows, then select a low number of cores with RegParallel. It runs on Windows but more efficient on Linux / Mac. The survival part is also in the vignette: https://github.com/kevinblighe/RegParallel#survival-analysis-via-cox-proportional-hazards-regression
There is a difference between testing each gene independently and including them all in the same model from the beginning. When your formula is this:
...then, the model interprets this as an additive effect between the 3 genes. The genes are 'adjusting' for each other's effects (for a better description, could talk to a statistician). The p-values that you get will differ from when you do:
The RegParallel code will test each variable independently for you and then piece the results together into a single table. This is effectively the same as what EdgeR, DESeq2, and limma are doing, too, i.e., they test each gene independently.
Once you have identified some key genes, then you could still just report them as independently statistically significantly associated with survival. Note the difference in the survival curve when you plot an independent model versus the additive model.
Thank you very much, for RegParallel I need to upgrade my R (I have R version 3.3.3).
Yes, this is a requirement from Bioconductor. When a package is being developed, it has to use the latest version. RegParallel was only accepted last month, so, it requires R 3.5. If you need to definitively use it, just obtain the development version direct from GitHub.
I have now removed the R version requirement for this in the development version. You can install it with:
It works perfectly, thank you very much!
In doing this, it would be a good idea to update all of your current packages with:
or