Hello all,
I downloaded both RNASeq and clinical data from the TCGA website to combine information and perform some data analysis. I am having some problems with the number of clinical data available. For instance, considering: Adrenocortical carcinoma (ACC), the website reports 79 cases for mRNA and 74 clinical cases. The problem is, when I consider all the clinical data I obtained from the website, i.e.,
nationwidechildrens.org_clinical_patient_acc.txt
nationwidechildrens.org_biospecimen_cqcf_acc.txt
nationwidechildrens.org_clinical_cqcf_acc.txt
I end up with a total of 88 unique samples with clinical information (I am interested only in the histological diagnosis for now). What am I doing wrong? Can someone give me a help with this "inconsistency"?
Thanks
having more data than you expected doesn't sound like a problem to me
Quite often, TCGA receives samples that may fail in any of the many QC steps that lead to good quality RNA-seq. Conversely, there may be some cases that go through the RNA-seq pipelines, but the TCGA never received any related clinical data from the center/hospital that provided the sample... though I don't actually know of any such cases.