I'm working on recreating the classification of a tumor set using pam in R.
I have a data set obtained from the authors of a recent study.
They perform consensus clustering (ConsensusClusterPlus-package) to derive stable subtypes and use that classification for deriving a classification gene signature using pam.
Using CCP with parameters from the paper I can get a 2-group split with the right number of tumors in both clusters (no RNG-seed was reported in the paper though).
When I use that cluster-split for training with the threshold-parameter from the paper, I get back the correct gene signature with all parameters exactly equal to those published in the supplement of the paper in question.
Using the pamr.predict-function on the data I can also get cluster designations for each tumor sample from pam.
However the paper shows a cross-table of the CCP-cluster designations and pam-designations, and these do not agree with what I see. The CCP-samples are seemingly right, but the pam-classification is off by 4 samples.
Is pam not a completely deterministic classifier for a given threshold or is there something I have missed?
Are there parameters downstream of fixing the cutoff-parameter (number of discriminating genes) that influence the cluster designations?
It is unlikely that another 2-group CCS-soultion for training would be the right answer as that would change the pam derived gene-signature. To be sure I ran CCP 500-times with different RNG-seeds to see how many alternate solutions with the "right" number of tumors per cluster were out there and the answer was 1 other (6/500 runs). That one did not reproduce the right gene-signature in pamr.
I also used the centroids of the pam-genes from the full data and tried nearest-neighbor classification using Person, Spearman and Euclidean distance, but no method reproduces the publication crosstable.
It is important for me that I can reproduce the exact clustering results from the paper in question which is why I obtained the data from the authors, they didn't however include any clustering-calls for individual samples.
I guess the next step is to bug the authors a bit more, but I wanted to check first if I have missed something very obvious.