Interpretation of TCGA clinical data
2
0
Entering edit mode
8.4 years ago
bxia ▴ 180

When I parse the xml file from TCGA,

I saw something like "age_at_initial_diagnosis", 63, precision = day, then I check the TCGA website, it is 63 years old...

and there are several number in xml about last day follow up... which number is correct? some patients look like being followed for more than 1 times, but I did calculation...the number does not match...

Thanks

RNA-Seq • 4.6k views
ADD COMMENT
1
Entering edit mode

Hi,

Can I just ask how did you parse the XML files ?

ADD REPLY
0
Entering edit mode

Python has some XML (and JSON) libraries that can be imported, which you might find helpful:

https://docs.python.org/2/library/xml.etree.elementtree.html

https://docs.python.org/2/library/json.html

ADD REPLY
3
Entering edit mode
6.0 years ago

I would suggest to use the data provided by the TCGA CDR (Clinical Data Resource) which have been manually curated and concatenated. It is described in this paper and should solve many of those problems.

ADD COMMENT
0
Entering edit mode
6.0 years ago
igor 13k

I personally find that Xena is the easiest way to download TCGA-related data. All the datasets are aggregated in a basic table format with consistent sample names.

For example, the Pan-Cancer data is here: https://xenabrowser.net/datapages/?cohort=TCGA%20Pan-Cancer%20(PANCAN)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443

ADD COMMENT

Login before adding your answer.

Traffic: 2469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6