I've a set of RNAseq data from breast cancer tissue samples (counts and post-cqn-normalised log(RPKM) values) and wish to use the PAM50 classifier to classify them.
I've seen the question genefu for PAM50 prediction and the question RNAseq data and PAM50 method, and neither are particularly helpful in terms of what I need to input into the R/genefu predictor (using intrinsic.cluster.predict) to get consistent PAM50 classification. I only have 138 samples, so I'm not going to be able to train the classifier before running it on the remaining samples.
Is there anywhere with a workflow from RNAseq counts to PAM50 types or can someone provide details as to how to go about this?
Thanks Kevin, but this is work in collaboration with a commercial company who will be running PAM50 on the non-deduplicated data (the sequencing included UMIs, which we are taking into account, and they aren't), so I was looking for the most robust way of running PAM50 on the data so we can do a direct comparison.