Question

Retrieving data from new data portal of tcga

0

Entering edit mode

8.4 years ago

noorpratap.singh ▴ 330

For retrieving rna-seq data from tcga https://gdc-portal.nci.nih.gov/, though I am able to download different expression level data and clinical data but I am struggling with mapping the two. I dont know how to map a certain patient's id in mRNA data to the corresponding id in clinical data. Any way I can sort this. Thanks in advance.

RNA-Seq Annotation gdc-portal • 7.5k views

ADD COMMENT • link updated 8.2 years ago by rli012 • 0 • written 8.4 years ago by noorpratap.singh ▴ 330

1

Entering edit mode

have they changed the structure of the TCGA barcodes?

ADD REPLY • link 8.4 years ago by russhh 5.7k

score 1 · Answer 1 · 2016-07-07

1

Entering edit mode

8.4 years ago

Mike ★ 1.9k

You can match by TCGA barcodes, match "patient.bcr_patient_barcode" column of clinical data with expression sample id.

ADD COMMENT • link 8.4 years ago by Mike ★ 1.9k

0

Entering edit mode

The thing is in clinical data all I have xml files from which I can still though extract patient barcode. But problem comes for the expression data I get each sample id which does not seem to correspond to barcode.

ADD REPLY • link 8.4 years ago by noorpratap.singh ▴ 330

score 0 · Answer 2 · 2016-07-10

0

Entering edit mode

8.4 years ago

noorpratap.singh ▴ 330

I got it in the end. I did not look at the metadata files. After downloading it provided information about the 1-1 correspondence between sample id and filename.

ADD COMMENT • link 8.4 years ago by noorpratap.singh ▴ 330

0

Entering edit mode

May I ask how did you map a certain patient's id in mRNA data to the corresponding id in clinical data? I've seen the MANIFEST as you mentioned, but the id in clinical data and mRNA data are not consistent...

ADD REPLY • link 8.3 years ago by Ann • 0

0

Entering edit mode

Please refer the answer below. Sorry for late reply.

ADD REPLY • link 8.2 years ago by noorpratap.singh ▴ 330

score 0 · Answer 3 · 2016-09-08

0

Entering edit mode

8.2 years ago

rli012 • 0

Hi Noorpratap, could you please explain how to link the biospecimen/clinical data to the transcriptome profiling data? Thanks

ADD COMMENT • link 8.2 years ago by rli012 • 0

1

Entering edit mode

When you download the respective files(expression, clinical, biospecimen etc) there is also an option to download the meta data file along it. Now thats a json file and for each patient there will be a field 'entity submitter id' (TCGA-..-...) barcode which will give you an idea about the patient and will be common in all the respective meta files. Though for mRNA it will be an extended form telling us about type of patient (normal or cancer). You would have to break that for mRNA to map to a patient. Thus total entries that you see for mRNA would be more at times with more than two entries for the same patient indicating for tumor and adjacent normal tissue sample. However for the clinical and biospecimen files the total entires would be equal to total number of patients. For more information about barcodes follow the link posted by @russhh above.

ADD REPLY • link 7.3 years ago by noorpratap.singh ▴ 330

1

Entering edit mode

Just in case anyone needs it, here is some example code in Python because JSON is fiddly.

If you download the manifest with your FPKM data you can match your files to their info like this:

import json

fileName='metadata.cart.2017-06-RESTOFID.json'

with open(fileName) as data_file:    
    data = json.load(data_file)

for i in range(0, len(data)):

     print data[i]['file_name'],  data[i]['associated_entities'][0]['entity_submitter_id']

ADD REPLY • link 7.4 years ago by Michele Busby ★ 2.2k