how to get ER, PR and HER2 data from TCGA BRCA
2
0
Entering edit mode
4.4 years ago
StartR ▴ 30

Hi I have dowloaded the BRCA data from TCGA using TCGABiolinks

I have done this:

BRCARnaseqSE <- GDCprepare(query.a, directory = "BRCA_all")
sample.info <- SummarizedExperiment::colData(BRCARnaseqSE)

Now I want to get data on ER, PR and HER2 - positive, negative samples, but I can not find any such columns. Here is the description of sample.info

names(sample.info)
 [1] "sample"                                      "patient"                                     "barcode"                                    
 [4] "shortLetterCode"                             "definition"                                  "days_to_recurrence"                         
 [7] "ajcc_staging_system_edition"                 "days_to_last_follow_up"                      "classification_of_tumor"                    
[10] "age_at_diagnosis"                            "icd_10_code"                                 "prior_malignancy"                           
[13] "morphology"                                  "created_datetime.x"                          "last_known_disease_status"                  
[16] "tumor_stage"                                 "updated_datetime.x"                          "days_to_last_known_disease_status"          
[19] "ajcc_pathologic_t"                           "treatments"                                  "year_of_diagnosis"                          
[22] "synchronous_malignancy"                      "state.x"                                     "ajcc_pathologic_m"                          
[25] "progression_or_recurrence"                   "prior_treatment"                             "site_of_resection_or_biopsy"                
[28] "ajcc_pathologic_n"                           "days_to_diagnosis"                           "tissue_or_organ_of_origin"                  
[31] "diagnosis_id"                                "tumor_grade"                                 "primary_diagnosis"                          
[34] "ajcc_pathologic_stage"                       "created_datetime.y"                          "cigarettes_per_day"                         
[37] "state.y"                                     "bmi"                                         "weight"                                     
[40] "exposure_id"                                 "height"                                      "alcohol_intensity"                          
[43] "alcohol_history"                             "updated_datetime.y"                          "years_smoked"                               
[46] "gender"                                      "created_datetime"                            "days_to_birth"                              
[49] "state"                                       "race"                                        "ethnicity"                                  
[52] "demographic_id"                              "year_of_birth"                               "vital_status"                               
[55] "age_at_index"                                "year_of_death"                               "updated_datetime"                           
[58] "days_to_death"                               "bcr_patient_barcode"                         "project_id"                                 
[61] "disease_type"                                "dbgap_accession_number"                      "name"                                       
[64] "released"                                    "releasable"                                  "primary_site"                               
[67] "is_ffpe"                                     "subtype_patient"                             "subtype_Tumor.Type"                         
[70] "subtype_Included_in_previous_marker_papers"  "subtype_vital_status"                        "subtype_days_to_birth"                      
[73] "subtype_days_to_death"                       "subtype_days_to_last_followup"               "subtype_age_at_initial_pathologic_diagnosis"
[76] "subtype_pathologic_stage"                    "subtype_Tumor_Grade"                         "subtype_BRCA_Pathology"                     
[79] "subtype_BRCA_Subtype_PAM50"                  "subtype_MSI_status"                          "subtype_HPV_Status"                         
[82] "subtype_tobacco_smoking_history"             "subtype_CNV.Clusters"                        "subtype_Mutation.Clusters"                  
[85] "subtype_DNA.Methylation.Clusters"            "subtype_mRNA.Clusters"                       "subtype_miRNA.Clusters"                     
[88] "subtype_lncRNA.Clusters"                     "subtype_Protein.Clusters"                    "subtype_PARADIGM.Clusters"                  
[91] "subtype_Pan.Gyn.Clusters"

I cannot see any info related to ER status, or something like er_status_by_ihc, or pr_status_by_ihc or her2_status_by_ihc

Please help!

Thanks!

BRCA TCGA • 2.6k views
ADD COMMENT
1
Entering edit mode
4.4 years ago

Not sure about TCGAbiolinks but the information is definitely available at the GDC Data Portal: A: How to download triple negative breast cancer RNA-seq fpkm data from GDC.

You can feasibly use that information and link it up to your TCGAbiolinks output.

Kevin

ADD COMMENT
0
Entering edit mode
12 months ago
Ram 44k

I came across this post because I had the same question. Here's the way I did it 3 years ago (saved in an old code file) and tested today (20-Nov-2023):

library(tidyverse)
library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-BRCA",
                  data.category = "Clinical",
                  data.type = "Clinical Supplement",
                  data.format = "BCR Biotab")
GDCdownload(query)
clinical.all <- GDCprepare(query)

tcga_brca.clin <- clinical.all$clinical_patient_brca

tcga_brca.tnbc_samples <- tcga_brca.clin %>%
    filter(er_status_by_ihc == 'Negative' &
                      pr_status_by_ihc == 'Negative' &
                      her2_status_by_ihc == 'Negative') %>%
    pull(bcr_patient_barcode)

tcga_brca.er_samples <- tcga_brca.clin %>%
    filter(er_status_by_ihc == 'Positive' &
                      her2_status_by_ihc != 'Positive') %>%
    pull(bcr_patient_barcode)

tcga_brca.her2_samples <- tcga_brca.clin %>%
    filter(her2_status_by_ihc == 'Positive') %>%
    pull(bcr_patient_barcode)

The BCR Biotab gives info on very limited number of samples. The BCR XML data.format has info on a lot more samples but I cannot find a function that parses it. Even the GDCprepare_clinic function seems to work on a rather limited subset of XML fields. I'm writing my own hack, will update as soon as it's done.

I think I was wrong - both the BioTab and XML give us the same data, just in a different number of files. I ran a preliminary test: the 116 TNBC patient IDs (TCGA-XX-XXXX) overlap a 100% between the two formats.

ADD COMMENT
0
Entering edit mode

Thank you Sir for posting.

ADD REPLY

Login before adding your answer.

Traffic: 2638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6