I'm trying to analyze RNAseq data from the TCGA-BRCA project. I've downloaded the STAR count tsv files as well as the clinical and sample manifests/metadata. The problem is that there is no column entry in the "clinical.tsv" which indicates tissue sample type so as far as I can tell there is no way to know if the STAR counts come from normal or disease tissue.
The "gdc_sample_sheet" clearly indicates that some samples are disease and others are normal tissue, but the "case submitter id" in the clinical file has the last few characters clipped and these are what identify the sample type. For instance, case_submitter_id: 'TCGA-E2-A154' in the clincal.tsv file is 'TCGA-E2-A154-01A' in the gdc_sample sheet, with "01A" indicating disease tissue. How do I know what RNAseq files come from normal tissue if any?
Go to https://portal.gdc.cancer.gov > Exploration > select TCGA and the cancer type you want > select case ID to download the files and have all the information you need about the sample
Thanks! Somehow I missed the "File Name" column in the gdc_sample_sheet.