Question

How to find key apoptotic markers across subtypes of a colorectal cancer dataset?

0

Entering edit mode

6.3 years ago

bio94 ▴ 60

I performed CMS and CRIS classification on the dataset GSE14333 and I have added the respective CMS and CRIS labels to my phenotype.

I was wondering on how to proceed from here on, to find the key apoptotic markers across subtypes. I am sort of new to this whole thing, so would appreciate any help.

Many thanks

head(GSE14333_pheno_new)
          X Location DukesStage Age Gender DFSTime DFS_group DFSCens AdjXRT AdjCTX
1 GSM358387   Rectum          B  54      M    9.96      poor       0      Y      Y
2 GSM358392    Right          B  38      F   17.95      poor       1      N      Y
3 GSM358395    Right          B  78      F   22.02      poor       1      N      Y
4 GSM358396     Left          B  65      F   22.38      poor       0      Y      Y
5 GSM358397     Left          B  65      F   22.38      poor       0      Y      Y
6 GSM358399     Left          B  56      F   25.21      poor       0      Y      Y
  RF.CMS1.posteriorProb RF.CMS2.posteriorProb RF.CMS3.posteriorProb RF.CMS4.posteriorProb
1                  0.20                  0.34                  0.40                  0.06
2                  0.46                  0.06                  0.03                  0.45
3                  0.76                  0.02                  0.03                  0.19
4                  0.10                  0.78                  0.00                  0.12
5                  0.01                  0.95                  0.04                  0.00
6                  0.35                  0.42                  0.22                  0.01
  RF.nearestCMS RF.predictedCMS predict.label2 dist.to.template dist.to.cls1.rank  nominal.p
1          CMS3            <NA>         CRIS-B        0.7331209                68 0.00019996
2          CMS1            <NA>         CRIS-A        0.8965833                52 0.00739852
3          CMS1            CMS1         CRIS-B        0.8559375                80 0.00019996
4          CMS2            CMS2         CRIS-C        0.7944693               111 0.00019996
5          CMS2            CMS2         CRIS-C        0.8465627               120 0.00179964
6          CMS2            <NA>         CRIS-D        0.9366855               148 0.00719856
       BH.FDR Bonferroni.p
1 0.000672593    0.0369926
2 0.010214375    1.0000000
3 0.000672593    0.0369926
4 0.000672593    0.0369926
5 0.002684947    0.3329334
6 0.010013035    1.0000000

cancer subtypes apoptotic markers R • 1.2k views

ADD COMMENT • link updated 6.3 years ago by Kevin Blighe 88k • written 6.3 years ago by bio94 ▴ 60

score 0 · Answer 1 · 2018-08-22

You could build predictive models for each CMS and CRIS classifier. Please take a look here:

Essentially, it would be a multinomial logistic regression analysis where you test each gene's ability to 'predict' the outcome, i.e., CMS or CRIS:

glm(predict.label2 ~ gene1, family="binomial")
glm(predict.label2 ~ gene2, family="binomial")
et cetera

When you get a final list of statistically significant genes from this, include them in a combined model and test it via R2 shrinkage and ROC analysis, as shown: A: Resources for gene signature creation

Note that you technically don't have to use gene expression as the predictors. You can also use other clinical parameters, e.g.:

glm(predict.label2 ~ DukesStage, family="binomial")
glm(predict.label2 ~ DukesSDFSTimetage, family="binomial")
glm(predict.label2 ~ XLocation, family="binomial")

Kevin