TCGA data - GDC data portal vs. Firebrowse
1
0
Entering edit mode
7.1 years ago
JJ ▴ 710

Hi all,

Can someone tell me why the TCGA patient numbers are different when comparing GDC data portal to firebrowse? The overall patient numbers are more or less identical. However, when looking at mRNA-seq data the patient numbers differ.

E.g. HNSC, Firebrowse shows 520 patients with mRNA-seq data whereas GDC shows 501 patients. Why is that? Sometimes it's even the other way around as with OV data.

Thanks for your input!

RNA-Seq • 2.9k views
ADD COMMENT
2
Entering edit mode
7.1 years ago
Sparrow_kop ▴ 260

Hi, the data of Firebrowse is from the raw TCGA project, while on the GDC , they first produce some harmonization pipelines, which may filter out some data.

From the GDC FAQ

Which one ask: "Why are some harmonized data files missing?"

Answer is "The GDC processes data through several harmonization pipelines. If the process of harmonization reveals issues in the underlying data or if an error occurred during harmonization, the harmonized data files (e.g. BAMs or VCFs) will not appear in GDC data access tools."

ADD COMMENT
0
Entering edit mode

Thanks for your comment. I speculated that it would be something like that. This would explain that in some instances GDC has less patients than Firebrowse, which is also mainly the case. However, OV and GBM have indeed more patients with mRNA-seq in GDC. Does firebrowse also filter data? Do you know that?

ADD REPLY
0
Entering edit mode

From the link link, I think firebrowse will also do some filter. But almost will be retained.

ADD REPLY

Login before adding your answer.

Traffic: 2692 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6