Hi,
I'm completely new to the gdc.cancer.gov portal and I need some help (I saw here are similar questions on this forum).
What I need to do is to download Gene Expression quantification data (using HTSeq-FPKM-UQ) for breast cancer and use these data to classify cancer subtypes (luminal A, B, HER2-like, basal-like).
To retrieve the labels I basically have 2 options (feel free to add more):
1) Get the sample id in the 'old' TCGA-barcode format (eg. "TCGA-AR-A1AL-01") and use a dictionary which I downloaded from an old article using the same data which directly maps barcode to subtype. The problem here is that I have no idea of how to get the TCGA-barcode format and it looks like the old API to do that does not work anymore.
2) Download the clinical data also and check the fields linked to ER, PgR, HER2 to manually assign labels. However, once I download the EXP data, I basically lose any metadata and I don't know how to join the two files (EXP, clinical) in order to assign labels. I know there must be a way of using API to do what I need.
Can someone more expert with the portal help me?
Thank you :)