Is the PAM50 subtype available for TCGA BRCA data?
2
1
Entering edit mode
9.1 years ago
nashtf ▴ 20

I have some UNC Illumina RNAseqV2 data with about 100 genes, 800 patients with UNC ID. I'd like to find the subtype of each tumor (normal, luminal A, luminal B, basal, HER2) for a classifier. Preferably with the UNC ID but if TCGA barcode is provided I believe it's possible to match them up. I can't find it on TCGA website. Maybe just looking in wrong places.

breast-cancer tcga rna-seq • 10k views
ADD COMMENT
4
Entering edit mode
8.6 years ago
hAjmal ▴ 50

You can use TCGAbiolinks to retrieve the list

source("http://www.bioconductor.org/biocLite.R")
library(TCGAbiolinks)
cancer <- "BRCA"
PlatformCancer <- "IlluminaHiSeq_RNASeqV2"
dataType <- "rsem.genes.results"
pathCancer <- "TCGAData/miRNA"

datQuery <- TCGAquery(tumor = cancer, platform = PlatformCancer, level = "3")  
lsSample <- TCGAquery_samplesfilter(query = datQuery)

# get subtype information
dataSubt <- TCGAquery_subtype(tumor = cancer)
lumA <- dataSubt[which(dataSubt$PAM50.mRNA == "Luminal A"),1]
allSamples <- lsSample$IlluminaHiSeq_RNASeqV2 #1218 total samples
lumASamples <- allSamples[grep(x = allSamples, pattern = paste(lumA, collapse = "|"))] # 263 luminal samples found
ADD COMMENT
2
Entering edit mode
9.1 years ago

You'll find a list derived from microrarrays (Nature 2012 release) at

https://tcga-data.nci.nih.gov/docs/publications/brca_2012/

specifically

http://tcga-data.nci.nih.gov/docs/publications/brca_2012/BRCA.547.PAM50.SigClust.Subtypes.txt

It appears that there is no canonical PAM50 call set for the RNAseq version, leaving everyone to make their own calls (using the genefu package or some other means) and getting somewhat different results for the edge case tumors.

ADD COMMENT
0
Entering edit mode

Should that mean that RNA seq data PAM50 is not a stable test?

ADD REPLY
0
Entering edit mode

@kanwarjag: I assume your answer is meant as a comment to my answer above; if so, please leave it as a comment next time rather than posting an answer to the question. What I mean by "somewhat different results for edge case tumors" is that, in practice, when you use genefu to assign PAM50 classes, the assignments are contingent on the centroid values used for the individual subtypes. For most tumors the assignment will be robust to small variations of these values, but in my experience there are about 10% of tumors that are sensitive to small variations in these parameters, and it's hard to get agreement on these tumors. This is not in any way related to RNAseq or the choice of technology; it's a function of how the test is performed.

ADD REPLY

Login before adding your answer.

Traffic: 2656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6