Question

How to know if high expression of a gene is really a good independent marker for poor prognosis in cancer?

0

Entering edit mode

5.5 years ago

curious ▴ 810

This is kind of a general question that I have been curious about for a while. Please consider my example:

I have a gene that seems like it should be associated with pro-cancer effects based on the literature.

I go ahead and mine rna-seq samples from cancer patients and code them as "high expressers" or "low expressers" of my gene.

I do some survival analysis using Univariate Cox regression and find out high expression of my gene is associated with significantly reduced overall survival.

My question is how do I actually know that this gene is an independent prognosticator and that this is not a correlation does not equal causation trap?

How do I know that when selecting my population of "high expressers" of my gene of interest I am not inadvertently over-representing some other gene that is a much more powerful and independent prognosticator?

Thank you

survival cox regression RNA-Seq tcga • 1.2k views

ADD COMMENT • link updated 5.5 years ago by Getting there ▴ 120 • written 5.5 years ago by curious ▴ 810

score 1 · Answer 1 · 2019-05-11

I would approach it by thinking about what is known about the function of the gene already...is it a known regulator of an established tumor suppressor or oncogene? A KEGG search might help with that (to follow Kevin's example here is p53 https://www.kegg.jp/kegg-bin/show_pathway?ko04110+K04451). If there's no clear answer, but some other supporting literature showing correlation with survival (and you can use cbio portal to look at a lot of cancer data. Here I looked at BRCA2 in breast cancer) Then maybe it's worth starting some functional experiment and taking RNA-seq data and seeing if it lines up with any protein expression, cell survival data, etc.

score 0 · Answer 2 · 2019-05-10

'Most' genes exhibit altered expression in cancer, but most of these are likely unrelated [directly] to the primary mechanism that drives the cancer. As we know, certain genes, like TP53, are the main drivers of tumourigenesis. Heightened TP53 expression will trigger a cellular cascade that will alter the expression of many other genes, downstream, but the altered expression of these downstream genes, on its own, is not what is driving tumourigenesis. Even still, the altered expression of these downstream genes can be seen as a biomarker or 'proxy' of tumourigenesis.

Some independent validation of your finding would help. So, like, search GEO microarray datasets and try to replicate the finding in those.

Also check the expression of your gene in Genotype-Tissue Expression (GTEx) data, i.e., in order to ensure that the gene is not just normally highly expressed in the tissue in which this cancer occurs.