There's a widely used gene set PAM50 of 50 genes used to classify breast cancer subtypes, introduced in this paper:
Parker et. al. Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes http://jco.ascopubs.org/content/27/8/1160.abstract
where can the actual listing of 50 genes obtained through their analysis be found? I have not seen it as a supplementary table in any of the papers. The only place I saw the genes named is in the first figure of this paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3487945/figure/F1/ but I was hoping to find a more parse-able downloadable format instead of retyping gene symbols from the figure.
edit: I just typed it out from the image, which is primitive and error prone (and using gene symbols to identify genes is imprecise) but there it is in case others find it helpful
UBE2T
BIRC5
NUF2
CDC6
CCNB1
TYMS
MYBL2
CEP55
MELK
NDC80
RRM2
UBE2C
CENPF
PTTG1
EXO1
ORC6L
ANLN
CCNE1
CDC20
MKI67
KIF2C
ACTR3B
MYC
EGFR
KRT5
PHGDH
CDH3
MIA
KRT17
FOXC1
SFRP1
KRT14
ESR1
SLC39A6
BAG1
MAPT
PGR
CXXC5
MLPH
BCL2
MDM2
NAT1
FOXA1
BLVRA
MMP11
GPR160
FGFR4
GRB7
TMEM45B
ERBB2
Have you tried writing to the corresponding author of Parker, et al?
no because I figured that pam50 is so widely cited and used that I must be missing something obvious and it's out there in parseable format -- else how are other people using it? the paper has over a thousand citations!
It's worth noting that: Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome.
I don't see the relevance. PAM50 is used for subtyping tumors, primarily, not for predicting outcomes.I don't think most gene expression signatures can subtype tumors correctly at all.
The relevance is that you need to be cautious and critical of claims that use gene signatures in classification (outcomes, subtypes, whatever).
fair enough but I don't see any evidence that this randomness results holds for subtypes. in breast cancer, the subtypes have biological meaning and it's unclear why random gene signatures would recapitulate that.
If you are happy with their methods and results than that is what matters. Biological meaning is indeed important and something missing from a large number of published classifiers.
I agree with this; however, the point is that using survival as a means of validating stratification is based on the assumption that each molecular subtype of cancer has significantly different survival time distributions. It is not valid in the case where cancer subtypes have similar distributions of survival times.