I was wondering how bioinformaticians fit new samples into their gene-expression clusters.
I have some novel pancreatic tumour RNAseq samples and I want to see if I can classify them to identify what subtype they belong to. There are several well-established pancreatic subtype classifications such as Bailey, Collisson and Moffitt.
Bailey, P., Chang, D., Nones, K. et al. Genomic analyses identify molecular subtypes of pancreatic cancer. Nature 531, 47–52 (2016). https://doi.org/10.1038/nature16965
Collisson, E., Sadanandam, A., Olson, P. et al. Subtypes of pancreatic ductal adenocarcinoma and their differing responses to therapy. Nat Med 17, 500–503 (2011). https://doi.org/10.1038/nm.2344
Moffitt, R., Marayati, R., Flate, E. et al. Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma. Nat Genet 47, 1168–1178 (2015). https://doi.org/10.1038/ng.3398
Is there a way to recreate and use the clustering produced by these papers to predict the subtype of my novel RNAseq data.
I know it must be possible as they classify each others data and TCGA data however I cannot find how this was done in the methods section other than a mention of the package "ConsensusClusterPlus" by Bailey.
Any links to packages, functions, tutorials or guides would be much appreciated!