Question

RNAseq and PAM50 prediction

0

Entering edit mode

5.3 years ago

graeme.thorn ▴ 110

I've a set of RNAseq data from breast cancer tissue samples (counts and post-cqn-normalised log(RPKM) values) and wish to use the PAM50 classifier to classify them.

I've seen the question genefu for PAM50 prediction and the question RNAseq data and PAM50 method, and neither are particularly helpful in terms of what I need to input into the R/genefu predictor (using intrinsic.cluster.predict) to get consistent PAM50 classification. I only have 138 samples, so I'm not going to be able to train the classifier before running it on the remaining samples.

Is there anywhere with a workflow from RNAseq counts to PAM50 types or can someone provide details as to how to go about this?

RNA-Seq R genefu • 1.7k views

ADD COMMENT • link updated 5.3 years ago by Kevin Blighe 89k • written 5.3 years ago by graeme.thorn ▴ 110

score 0 · Answer 1 · 2020-05-04

Hey Graeme,

I do not believe log(RPKMs) are ideal for this. If that is all that you have, then no problem, though.

I am convinced that a handful or even more of those PAM50 genes are not adding much information in terms of risk of metastasis in ER-positive, Her2-negative breast tumours. I neither believe there is any workflow for you to follow in relation to this, but you should have knowledge of regression and classification models. I gave a previous answer, here: How to exclude some of breast cancer subtypes just by looking at gene expression?

I would be interested in different approaches:

RandomForest®
Penalised regression (my previous answer)
Stepwise regression and / or just include all genes in the same regression model

To use any of these models to full effect, you would have to build it on known cases where metastasis occurred / did not occur, and then predict it on unknown cases.

Kevin