Are TCGA data from UCSC cancer browser and TCGAbiolinks different?
1
0
Entering edit mode
6.2 years ago
wenbinm ▴ 40

Hi there,

I downloaded TCGA BRCA RNAseq data from UCSC cancer browser or used TCGAbiolinks:

library(TCGAbiolinks)
query <- GDCquery(project = 'TCGA-BRCA', data.category = 'Transcriptome Profiling', data.type = 'Gene Expression Quantification', workflow.type = 'HTSeq - Counts')
GDCdownload(query)
brca.seq <- GDCprepare(query)

And checked the expression of SOX10:

library(DESeq2)
r = rowData(brca.seq)
as.numeric(assay(brca.seq[which(r$external_gene_name == 'SOX10'),]))

It turns out its expression is zero in all patients. But in data from UCSC cancer browser (HiSeqV2) SOX10 average expression is 6. The data from UCSC can be found here: https://tcga.xenahubs.net/download/TCGA.BRCA.sampleMap/HiSeqV2.gz

Another question, TCGAbiolinks is more updated than UCSC caner browser as it directly downloads data from TCGA right?

Thank you!

TCGA RNA-Seq • 2.3k views
ADD COMMENT
1
Entering edit mode
6.2 years ago
mary ▴ 20

Hello,

Can you please tell me how you are seeing that the expression is 6 in UCSC Xena? For me I see that it is 0 for all samples in the GDC TCGA BRCA cohort: https://xenabrowser.net/?bookmark=1c841f9f54e697573dc2d9aa5b6be22b (sorry about the red color, it is because Xena is not sure how to color the samples when they are all the same value)

While technically the data from TCGAbiolinks will be more up-to-date than UCSC Xena, for this particular data there is unlikely to be a lag since it has been out for a long time.

Best, Mary

ADD COMMENT
0
Entering edit mode

Thank you for your quick response! I downloaded UCSC Xena data from here, unzipped it and opened the file with excel: https://tcga.xenahubs.net/download/TCGA.BRCA.sampleMap/HiSeqV2.gz

Then I took a look at SOX10 expression data and the first 5 numbers are 6.5221 0 8.308 6.3628 0.5819. Maybe I make some mistakes here.......

ADD REPLY
1
Entering edit mode

Ah, that is the legacy TCGA data, not the TCGA data from the GDC. TCGAbiolinks is the data from the GDC, as far as I can tell. The GDC TCGA data on Xena is here: https://gdc.xenahubs.net/download/TCGA-BRCA/Xena_Matrices/TCGA-BRCA.htseq_fpkm-uq.tsv.gz.

As to why the legacy TCGA data is different from the TCGA data from the GDC, I recommend contacting the GDC: https://gdc.cancer.gov/support

ADD REPLY
0
Entering edit mode

the legacy TCGA data came from hg19 version and TCGA data from the GDC now use hg38 version. therefore, it will have some difference.

ADD REPLY

Login before adding your answer.

Traffic: 2768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6