Hello & good day,
I need help regarding downloading TCGA (RNA-seq) dataset. I have no idea about Bioinformatics, my background is computer science. I would like to run this code. I'm trying to run the code provided here: https://github.com/luisvalesilva/multisurv/blob/master/data/preprocess_omics.ipynb
I have an issue in mapping files to patient via cases.0.submitter_id
.
When I downloaded RNA-seq from the manifest provided by the author gdc_manifest.2019-08-23.txt
, the mapping process failed, I think due to changes that occurred in the gdc portal. So what I did is, I downloaded the updated version of RNA-seq from gcd, then when I map files to patients, I was able to map only around 1000+ patients whereas around 8000+ couldn't.
Could you guide me how to handle this?
Does that means only around 1000 patients RNA-seq are publicly available (access control = open)?
Your help is much appreciated.
Thank you
Not sure about the particular notebook you mentioned. But in general, once you added these files to GDC cart, there is a sample_sheet you can download from the cart. This sample sheet TSV file contains the exact file to patient mapping you are asking for.
Thank you Zhenyu Zhang, I wasn't aware about this (sample_sheet).