Update, 12th May 2018
Since this post was made, a rapid way to interrogate the GDC data for the purposes of converting UUIDs to TCGA Barcodes was found using R Programming Language. See the thread here: Sample names for TCGA data from GDC-legacy archive
Kevin,
Moderator.
For those not familiar with the command line and with the JSON query language, here is a fairly simple way to map UUIDS to TCGA barcode ID using R and a canned command in the terminal
The first part is in R
1) Extract the files ID from your manifest file (the one you get from the GDC after you downloaded your data)
setwd("C:/Here/your/manifest/directory")
manifest= "gdc_manifest_20160921_171519.txt" #Manifest name
x=read.table(manifest,header = T)
manifest_length= nrow(x)
id= toString(sprintf('"%s"', x$id))
2) Create Payload.txt with the commands needed
This commands are extracted from the GDC website https://gdc-docs.nci.nih.gov/API/Users_Guide/Search_and_Retrieval/
Part1= '{"filters":{"op":"in","content":{"field":"files.file_id","value":[ '
Part2= '] }},"format":"TSV","fields":"file_id,file_name,cases.submitter_id,cases.case_id,data_category,data_type,cases.samples.tumor_descriptor,cases.samples.tissue_type,cases.samples.sample_type,cases.samples.submitter_id,cases.samples.sample_id,cases.samples.portions.analytes.aliquots.aliquot_id,cases.samples.portions.analytes.aliquots.submitter_id","size":'
Part3= paste(shQuote(manifest_length),"}",sep="")
Sentence= paste(Part1,id,Part2,Part3, collapse=" ")
write.table(Sentence,"Payload.txt",quote=F,col.names=F,row.names=F)
The second part is in the command line (CMD or terminal)
cd C:/Here/your/manifest/directory
curl --request POST --header "Content-Type: application/json" --data @Payload.txt "https://gdc-api.nci.nih.gov/files" > File_metadata.txt
Now you should have a file called File_metadata.txt in your working folder with all the data you need
If you get a message:
'curl' is not recognized as an operable program or batch file.
you should install the cURL library in your computer (if you don't know how to do it, follow this link)
Hi, I try use this method for retrieving the sample ID, but it failed, the error in the File_metadata.txt is: { "message": "400 Bad Request: The browser (or proxy) sent a request that this server could not understand." }
how to fix it? Thanks.
Can you post your code?
Another answer here: A: Sample names for TCGA data from GDC-legacy archive Also check the blog of Seán Davis.
Thank you. It worked for my prostate cancer RNA-seq data.
thanks for the post, it was very useful ..
Thanks for this convenient solution!
I observed two small issues from my implementation.
1
The current URL for this search should be "https://api.gdc.cancer.gov/files" rather than "https://gdc-api.nci.nih.gov/files". I have "curl: (6) Could not resolve host: gdc-api.nci.nih.gov; Unknown error" using the latter.
2
In the R script
this will result in single quote of the size, which caused error in my searching. After the following modification, it worked for me.
Thank you very much!
A better solution: C: Sample names for TCGA data from GDC-legacy archive