Hi there,
I downloaded TCGA BRCA RNAseq data from UCSC cancer browser or used TCGAbiolinks:
library(TCGAbiolinks)
query <- GDCquery(project = 'TCGA-BRCA', data.category = 'Transcriptome Profiling', data.type = 'Gene Expression Quantification', workflow.type = 'HTSeq - Counts')
GDCdownload(query)
brca.seq <- GDCprepare(query)
And checked the expression of SOX10:
library(DESeq2)
r = rowData(brca.seq)
as.numeric(assay(brca.seq[which(r$external_gene_name == 'SOX10'),]))
It turns out its expression is zero in all patients. But in data from UCSC cancer browser (HiSeqV2) SOX10 average expression is 6. The data from UCSC can be found here: https://tcga.xenahubs.net/download/TCGA.BRCA.sampleMap/HiSeqV2.gz
Another question, TCGAbiolinks is more updated than UCSC caner browser as it directly downloads data from TCGA right?
Thank you!
Thank you for your quick response! I downloaded UCSC Xena data from here, unzipped it and opened the file with excel: https://tcga.xenahubs.net/download/TCGA.BRCA.sampleMap/HiSeqV2.gz
Then I took a look at SOX10 expression data and the first 5 numbers are 6.5221 0 8.308 6.3628 0.5819. Maybe I make some mistakes here.......
Ah, that is the legacy TCGA data, not the TCGA data from the GDC. TCGAbiolinks is the data from the GDC, as far as I can tell. The GDC TCGA data on Xena is here: https://gdc.xenahubs.net/download/TCGA-BRCA/Xena_Matrices/TCGA-BRCA.htseq_fpkm-uq.tsv.gz.
As to why the legacy TCGA data is different from the TCGA data from the GDC, I recommend contacting the GDC: https://gdc.cancer.gov/support
the legacy TCGA data came from hg19 version and TCGA data from the GDC now use hg38 version. therefore, it will have some difference.