I am very new to the field of Cancer genomics so this might be very naive query
I am interested in finding the dataset of the following study "Whole-Genome Gene Expression Profiling of Formalin-Fixed, Paraffin-Embedded Tissue Samples" the data is submitted to GEO (Accession Number: GSE17599). However, I was not able to find this dataset on cBioportal. I thought maybe cBioportal has not added the data from TCGA but was not able to find it on TCGA as well.
Is there any way I can find data using GEO Accession Number in cBioportal, TCGA, GDAC firehose etc? Are there any other resources where I can find the datasets.
R package or script which does it automatically will be very helpful.
Thanks in advance
Perhaps you can define better what type of data you are looking for and why do you want to use cBioportal? to analyze cancer data? visualize? download? note that cBioportal does not necessarily overlap with GEO.. anyway you can also find/analyze cancer data here: https://xenabrowser.net/datapages/
I can see that in the link mentioned by you cohort: GDC TCGA Prostate Cancer (PRAD) (11 datasets) and cohort: TCGA Prostate Cancer (PRAD) (22 datasets)
what I want to know is how can I find what datasets are part of those 11 and 22 datasets? If I am not wrong some of them, if not all, must be part of GEO.
Moreover, why is there difference between number of datasets in GDC TCGA Prostate Cancer (PRAD) and cohort: TCGA Prostate Cancer (PRAD)?
TCGA data is not in GEO.
The number of datasets is different because they are pulled from different sources. More importantly, the number of samples is basically the same. For RNA-seq, one has 550 samples and the other has 551 samples. One of the samples must have been removed at some point potentially due to QC issues.
That clears some of my doubts
Many thanks to @igor for clarifying my doubts :)
how you think can I proceed with the following task
Can I use the data/results from cBioPortal with those not part of cBioPortal like GSE17599?
Note that GSE17599 holds gene expression data and not sequencing data - so it could not be used to predict mutations
If you are looking to find mutations in cancer this is a good place to start: http://cancerhotspots.org/?ref=labworm#/download or you can download MAF files from TCGA or other studies.
Next I assume you will want to predict mutations in your own data, so you might want to checkout this post
Basically you cannot load you own data to cBioportal, but there are some functions that you can run on your own data - see here