Question

Interpretation of TCGA clinical data

0

Entering edit mode

9.1 years ago

bxia ▴ 180

When I parse the xml file from TCGA,

I saw something like "age_at_initial_diagnosis", 63, precision = day, then I check the TCGA website, it is 63 years old...

and there are several number in xml about last day follow up... which number is correct? some patients look like being followed for more than 1 times, but I did calculation...the number does not match...

Thanks

RNA-Seq • 4.9k views

ADD COMMENT • link updated 6.6 years ago by igor 13k • written 9.1 years ago by bxia ▴ 180

1

Entering edit mode

Hi,

Can I just ask how did you parse the XML files ?

ADD REPLY • link 9.0 years ago by jan ▴ 170

0

Entering edit mode

Python has some XML (and JSON) libraries that can be imported, which you might find helpful:

https://docs.python.org/2/library/xml.etree.elementtree.html

https://docs.python.org/2/library/json.html

ADD REPLY • link 6.6 years ago by Charles Warden 8.3k

score 3 · Answer 1 · 2018-12-14

3

Entering edit mode

6.6 years ago

Kristoffer Vitting-Seerup ★ 4.2k

I would suggest to use the data provided by the TCGA CDR (Clinical Data Resource) which have been manually curated and concatenated. It is described in this paper and should solve many of those problems.

ADD COMMENT • link 6.6 years ago by Kristoffer Vitting-Seerup ★ 4.2k

score 0 · Answer 2 · 2018-12-14

I personally find that Xena is the easiest way to download TCGA-related data. All the datasets are aggregated in a basic table format with consistent sample names.

For example, the Pan-Cancer data is here: https://xenabrowser.net/datapages/?cohort=TCGA%20Pan-Cancer%20(PANCAN)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443