Question

Where To Download Pam50 Gene Set?

15

Entering edit mode

11.8 years ago

user ▴ 960

There's a widely used gene set PAM50 of 50 genes used to classify breast cancer subtypes, introduced in this paper:

Parker et. al. Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes http://jco.ascopubs.org/content/27/8/1160.abstract

where can the actual listing of 50 genes obtained through their analysis be found? I have not seen it as a supplementary table in any of the papers. The only place I saw the genes named is in the first figure of this paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3487945/figure/F1/ but I was hoping to find a more parse-able downloadable format instead of retyping gene symbols from the figure.

edit: I just typed it out from the image, which is primitive and error prone (and using gene symbols to identify genes is imprecise) but there it is in case others find it helpful

UBE2T
BIRC5
NUF2
CDC6
CCNB1
TYMS
MYBL2
CEP55
MELK
NDC80
RRM2
UBE2C
CENPF
PTTG1
EXO1
ORC6L
ANLN
CCNE1
CDC20
MKI67
KIF2C
ACTR3B
MYC
EGFR
KRT5
PHGDH
CDH3
MIA
KRT17
FOXC1
SFRP1
KRT14
ESR1
SLC39A6
BAG1
MAPT
PGR
CXXC5
MLPH
BCL2
MDM2
NAT1
FOXA1
BLVRA
MMP11
GPR160
FGFR4
GRB7
TMEM45B
ERBB2

cancer annotation classification • 32k views

ADD COMMENT • link updated 6 months ago by joelsparker1 ▴ 200 • written 11.8 years ago by user ▴ 960

1

Entering edit mode

Have you tried writing to the corresponding author of Parker, et al?

ADD REPLY • link 11.8 years ago by Alex Paciorkowski 3.5k

0

Entering edit mode

no because I figured that pam50 is so widely cited and used that I must be missing something obvious and it's out there in parseable format -- else how are other people using it? the paper has over a thousand citations!

ADD REPLY • link 11.8 years ago by user ▴ 960

1

Entering edit mode

It's worth noting that: Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome.

ADD REPLY • link 11.8 years ago by Neilfws 49k

0

Entering edit mode

I don't see the relevance. PAM50 is used for subtyping tumors, primarily, not for predicting outcomes.I don't think most gene expression signatures can subtype tumors correctly at all.

ADD REPLY • link 11.8 years ago by user ▴ 960

0

Entering edit mode

The relevance is that you need to be cautious and critical of claims that use gene signatures in classification (outcomes, subtypes, whatever).

ADD REPLY • link 11.8 years ago by Neilfws 49k

0

Entering edit mode

fair enough but I don't see any evidence that this randomness results holds for subtypes. in breast cancer, the subtypes have biological meaning and it's unclear why random gene signatures would recapitulate that.

ADD REPLY • link 11.8 years ago by user ▴ 960

0

Entering edit mode

If you are happy with their methods and results than that is what matters. Biological meaning is indeed important and something missing from a large number of published classifiers.

ADD REPLY • link 11.8 years ago by Neilfws 49k

0

Entering edit mode

I agree with this; however, the point is that using survival as a means of validating stratification is based on the assumption that each molecular subtype of cancer has significantly different survival time distributions. It is not valid in the case where cancer subtypes have similar distributions of survival times.

ADD REPLY • link 10.6 years ago by michael.sharpnack • 0

score 20 · Answer 1 · 2014-02-19

20

Entering edit mode

11.2 years ago

joelsparker1 ▴ 200

Edited with new links

My apologies that these were difficult to find, but the information is out there in a usable form. The centroids, gene lists, and R code to produce the classification are all available along with the clinical information for the training set on this page: https://unclineberger.org/peroulab/algorithms/

Specifically, the R code and supporting data files are here:https://genome-publications.bioinf.unc.edu/PAM50/

Anyone running PAM50 (or any classifier based on relative measurements such as expression) should understand the concepts in this paper: http://www.breast-cancer-research.com/content/pdf/s13058-015-0520-4.pdf

ADD COMMENT • link 6 months ago by joelsparker1 ▴ 200

1

Entering edit mode

All the links given are asking to sent email to get the data

ADD REPLY • link 22 months ago by DareDevil ★ 4.4k

0

Entering edit mode

It appears that the link to "classification of the PAM50 plus Claudin-low" is broken. Can anyone post the pdf link?

ADD REPLY • link 4.9 years ago by gacrestani • 0

score 9 · Answer 2 · 2014-01-27

Hello, List of PAM50 genes: Gene symbol: ACTR3B, ANLN, BAG1, BCL2, BIRC5, BLVRA, CCNB1, CCNE1, CDC20, CDC6, CDH3, CENPF, CEP55, CXXC5, EGFR, ERBB2, ESR1, EXO1, FGFR4, FOXA1, FOXC1, GPR160, GRB7, KIF2C, KRT14, KRT17, KRT5, MAPT, MDM2, MELK, MIA, MKI67, MLPH, MMP11, MYBL2, MYC, NAT1, NDC80, NUF2, ORC6L, PGR, PHGDH, PTTG1, RRM2, SFRP1, SLC39A6, TMEM45B, TYMS, UBE2C, UBE2T

There corresponding Acc no

accession no: AB209174, NM_018685, NM_004323, NM_000633, NM_001012271, BX647539, NM_031966, BC035498, BG256659, NM_001254, NM_001793, NM_016343, NM_018131, BC006428, NM_005228, NM_001005862, NM_001122742, NM_130398, AB209631, NM_004496, NM_001453, AJ249248, NM_005310, NM_006845, BC042437, AK095281, M21389, NM_001123066, M92424, NM_014791, BG765502, NM_002417, NM_024101, NM_005940, BX647151, NM_002467, BC013732, NM_006101, NM_145697, NM_014321, NM_000926, AK093306, BE904476, AK123010, BC036503, NM_012319, AK098106, BQ056428, BC032677, BF690859

score 6 · Answer 3 · 2013-07-29

6

Entering edit mode

11.8 years ago

arno.guille ▴ 420

You can download PAM50 gene set, Sorlie500 gene set and Hu306 gene set from the sup data of this paper. Breast cancer molecular profiling with single sample predictors: a retrospective analysis. http://www.ncbi.nlm.nih.gov/pubmed/20181526 Or with the genefu Package from Bioconductor http://www.bioconductor.org/packages/2.12/bioc/manuals/genefu/man/genefu.pdf Hope this helps

ADD COMMENT • link 11.8 years ago by arno.guille ▴ 420

0

Entering edit mode

I suppose R code in a PDF behind a paywall is a little better than a PNG :) The Bioconductor link is good, though.

ADD REPLY • link 11.8 years ago by Neilfws 49k

score 3 · Answer 4 · 2013-07-28

Much "Googling" leads me to the same conclusion as you: this list of genes is not readily available as a list in plain text format. The best I could find was Figure A2 in this reference and this bookmark in the Cancer Genome Browser. There might be a way to download the list using the latter resource.

This is rather typical for cancer research, where results are often commercialized and/or patented, so it's in the interests of researchers to hide and obfuscate the raw data. They might also have chosen a name for the classifier that doesn't sound like a scoring matrix for sequence alignment!