Hi,
I have downloaded TCGA breast cancer data. A total of 1256 fastq files. I have the UUID's. So I used "Genomics Data Commons" package to get the TCGA-Barcodes for those UUID's. But I see duplicate matching sample names. Which one should I pick for the analysis?
UUID samplenames
5516dd59-3d95-4bc6-84e7-5719b1bbcabf TCGA-A7-A26F-01B
a907f2d1-92ad-4a1b-b439-20e5a7347d5b TCGA-A7-A26F-01A
b570a72f-5e6c-4301-923b-9992662409ca TCGA-A7-A26F-01B
ba22d7e6-3e70-4a43-9dc1-59069b39e8c2 TCGA-A7-A26F-01B
eb068925-2dcc-4e18-838f-903ac8d2b661 TCGA-A7-A26F-01A
See: Different TCGA file IDs with same the Sample ID and Samples with same TCGA barcode in TCGA data
yes But for gdc legacy data I dont see any aliquots like given in gdc harmonized data.
https://portal.gdc.cancer.gov/legacy-archive/files/a907f2d1-92ad-4a1b-b439-20e5a7347d5b
@Sean Davis Could you please tell me this. With "Genomics Data Commons" package I got the submitter id's for UUID's. But there are duplicates. Which one should I pick? I dont even have the plate number to select the samples. Is there way to get the whole TCGA-Barcode like "TCGA-A6-6781-01A-22R-A278-07" from UUID's so that I can select based on plate numbers.
Tagging: Sean Davis