After download batches of gene expression files from TCGA gdc (about 600 hundreds files), each file contains only one sample. I wanna know which file is a tumor sample, which file is a normal sample. I want to find differentially expressed genes via these genes next step. However, I don't know how to find sample type information in TCGA, anyone can help me? What I want is like this, a wiki of TCGA, because I didn't find similar tools in gdc.
I hope I can download a tab-delimited file contain these information:
study **barcode** disease disease_name **sample_type** sample_type_name analyte_type library_type center center_name platform **filename**
TCGA TCGA-56-7222-01A-11R-2045-07 LUSC Lung squamous cell carcinoma TP 01 RNA RNA-Seq UNC-LCCC UNC-LCCC ILLUMINA UNCID_2195465.13daa1a0-a236-474e-b621-eb131be34af1.120305_UNC16-SN851_0133_AC0JB6ACXX_6_GGCTAC.tar.gz
Thanks for answering me, but do you know how to get the corresponding TCGA barcode of a given file name, eg. file "UNCID_1840921.148d34df-aec2-42bf-8e36-b91a68959606.sorted_genome_alignments.bam" is belong to "TCGA-43-7657-11A-01R-2125-07"
I used to get tab-limited files, but I didn't find a similar filter tools in gdc
Yes GDC is bit confusing right now. I wish they will improve the documentation in future.
Select Data -> cancer program TCGA + project TCGA-XXXX ( in cases tab) -> Data format TSV ( on files tab)
Then Download Manifest
I hope this works.
Or try with GDC tool,
https://gdc-docs.nci.nih.gov/API/PDF/API_UG.pdf#page50
Manifest endpoint
Thanks for replying me ! But I'm sorry to say that it didn't work because I got the file like this: id filename md5 size state b99b9f44-00d2-443c-93f2-b0357491ed63 isoforms.quantification.txt a20f30bc3fe55fae1433949495884514 358879 submitted 1a657f88-2a88-4c46-b7ce-4d48c6d6ba15 isoforms.quantification.txt 3042012a718a35a59acd74bd97a1d257 410661 submitted
I found if I click download, choose "biospeciman" or "File metadata" , I can get these information including "TCGA barcode", "file name" etc. , but the format is ".json" :(
Thank you all the same!
Thats great. You can try converting Json to csv ot tsv using some tools like this,
http://www.convertcsv.com/json-to-csv.htm
I hope it works.
Great! Thanks a lot :)
Broad-GDAC could be alternative way to download TCGA data
https://gdac.broadinstitute.org/
Thanks, it can export pdf.
Hi did you solve the problem? Converting GDC filename to TCGA barcode to find tumor or normal sample information
Did you solve the problem? I met same problem to convert filename TCGA barcode to find tumor normal information.