Hi there,
I would like to find genes correlated with poor prognosis. I am doing a simple survival analysis:
- divide patients into two groups by gene expression (using median as cutoff).
- find genes significantly correlated with overall survival time (using coxph function in R).
- check whether my list of genes are up or down regulated in cancer samples compared to normal samples.
- finding genes with hazard ratio larger than 1 (low expression group lives longer) that are up regulated in cancer sample and also genes with hazard ratio smaller than 1 that are down in tumors.
Am I doing it right? Is the 4th step necessary? Must the genes with hazard ratio larger than 1 be up regulated in tumor compared to normal tissue (or the hazard ratio won't make any sense)?
Thank you!
I agree with the answer but I would just add that you need to take into account multiple testing! If you are testing all genes to see if they correlate with survival you are doing 20k hypothesis tests. You need the probability of finding something "statistically significant" just by chance without a real relationship between the gene and survival is very high. You need to correct for multiple testing to take that into account.
That is indeed correct, bernatgel. Thanks! ¡Gracias!