Mapping IDs and file names from TCGA datasets
1
0
Entering edit mode
21 months ago
Jaehyun ▴ 10

Hello,

I want to analyze multiple files from the TCGA-BRCA project downloaded from the GDC portal. However, I have some difficulty using different data from the same samples.

For example, a case ID TCGA-E2-A1IU has proteome profiling and DNA methylation data. The problem is that the methylation file has its unique name (e2ae1cb4-0422-4c7d-9878-ba8449afaacd.methylation_array.sesame.level3betas.txt), and other IDs or barcodes do not exist in the file.

I searched other files for the case, such as biospecimen or clinical data files, but I could not find any information that maps between case IDs and file IDs. I may write all of the case ID and file name pairs one by one, but it may be inefficient.

I would appreciate it if you could teach me any method to solve the problem.

Thank you.

TCGA • 1.3k views
ADD COMMENT
2
Entering edit mode
21 months ago
pilargmarch ▴ 110

What I usually do is add all of the desired files to the cart and then download the sample sheet, which looks like this:

File ID File Name   Data Category   Data Type   Project ID  Case ID Sample ID   Sample Type

efbd072e-9d71-4729-84d8-3eb34996078f    e2ae1cb4-0422-4c7d-9878-ba8449afaacd.methylation_array.sesame.level3betas.txt   DNA Methylation Methylation Beta Value  TCGA-BRCA   TCGA-E2-A1IU    TCGA-E2-A1IU-01A    Primary Tumor

fc52a9be-ac8d-41ba-a9d2-a6dc556bc2cf    TCGA-E2-A1IU-01A-21-A17J-20_RPPA_data.tsv   Proteome Profiling  Protein Expression Quantification   TCGA-BRCA   TCGA-E2-A1IU    TCGA-E2-A1IU-01A    Primary Tumor

It is annoying that there's no easier way to do this (I'm sure you could do it with the API though), so what I usually do is to use the R/Bioconductor package TCGAbiolinks, which handles all the files and names on its own. You simply query your patients (e.g. TCGA-E2-A1IU, although you can only handle one tumor type at a time, like BRCA) and your desired data type (e.g. Reverse Phase Protein Array, but only one data type at a time). So for example, if you have 5 BRCA patients and 3 data types you're interested in, you'd query and get 3 different objects where the columns correspond to the 5 patients.

ADD COMMENT
0
Entering edit mode

Thank you so much! I think I can try both the sample sheet and the R package.

ADD REPLY

Login before adding your answer.

Traffic: 1032 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6