how to find sample type information from TCGA?
1
8
Entering edit mode
8.4 years ago
aichen ▴ 100

After download batches of gene expression files from TCGA gdc (about 600 hundreds files), each file contains only one sample. I wanna know which file is a tumor sample, which file is a normal sample. I want to find differentially expressed genes via these genes next step. However, I don't know how to find sample type information in TCGA, anyone can help me? What I want is like this, a wiki of TCGA, because I didn't find similar tools in gdc.

I hope I can download a tab-delimited file contain these information:

study  **barcode**                   disease  disease_name                  **sample_type**  sample_type_name  analyte_type  library_type  center    center_name  platform  **filename**
TCGA   TCGA-56-7222-01A-11R-2045-07  LUSC     Lung squamous cell carcinoma  TP               01                RNA           RNA-Seq       UNC-LCCC  UNC-LCCC     ILLUMINA  UNCID_2195465.13daa1a0-a236-474e-b621-eb131be34af1.120305_UNC16-SN851_0133_AC0JB6ACXX_6_GGCTAC.tar.gz
RNA-Seq TCGA • 12k views
ADD COMMENT
3
0
Entering edit mode

Thanks for answering me, but do you know how to get the corresponding TCGA barcode of a given file name, eg. file "UNCID_1840921.148d34df-aec2-42bf-8e36-b91a68959606.sorted_genome_alignments.bam" is belong to "TCGA-43-7657-11A-01R-2125-07"

ADD REPLY
0
Entering edit mode

I used to get tab-limited files, but I didn't find a similar filter tools in gdc

ADD REPLY
0
Entering edit mode

Yes GDC is bit confusing right now. I wish they will improve the documentation in future.

Select Data -> cancer program TCGA + project TCGA-XXXX ( in cases tab) -> Data format TSV ( on files tab)

Then Download Manifest

I hope this works.

Or try with GDC tool,

https://gdc-docs.nci.nih.gov/API/PDF/API_UG.pdf#page50

Manifest endpoint

ADD REPLY
1
Entering edit mode

Thanks for replying me ! But I'm sorry to say that it didn't work because I got the file like this: id filename md5 size state b99b9f44-00d2-443c-93f2-b0357491ed63 isoforms.quantification.txt a20f30bc3fe55fae1433949495884514 358879 submitted 1a657f88-2a88-4c46-b7ce-4d48c6d6ba15 isoforms.quantification.txt 3042012a718a35a59acd74bd97a1d257 410661 submitted

I found if I click download, choose "biospeciman" or "File metadata" , I can get these information including "TCGA barcode", "file name" etc. , but the format is ".json" :(

Thank you all the same!

ADD REPLY
1
Entering edit mode

Thats great. You can try converting Json to csv ot tsv using some tools like this,

http://www.convertcsv.com/json-to-csv.htm

I hope it works.

ADD REPLY
0
Entering edit mode

Great! Thanks a lot :)

ADD REPLY
0
Entering edit mode

Broad-GDAC could be alternative way to download TCGA data

https://gdac.broadinstitute.org/

ADD REPLY
0
Entering edit mode

Thanks, it can export pdf.

ADD REPLY
0
Entering edit mode

Hi did you solve the problem? Converting GDC filename to TCGA barcode to find tumor or normal sample information

ADD REPLY
0
Entering edit mode

Did you solve the problem? I met same problem to convert filename TCGA barcode to find tumor normal information.

ADD REPLY

Login before adding your answer.

Traffic: 1631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6