Entering edit mode
4.2 years ago
Ali
•
0
Hi everyone I use GDC TCGA to train machine learning, due to the low number of samples in the dataset I use I can't extract the testing set. So, I want to get testing set for RNA-seq AML from different resources.
Kind regards
Have you tried GEO or cBioPortal to see if they have any dataset you could use?
Thanks for your reply
I use TCGA data was fine to download and run the analysis where the sample and genes are gathered in one csv file. GEO is not the same. looking for similar Kind regards
If the model that you have developed is 'robust' (yes, that magic word again...), then it should replicate in another dataset, including from GEO. Perhaps think through your experimental design again.
to test the model I have to use the clinical file as well where I can specify the sample class All the issue I am facing based on the clinical file (sample information)
Great, but please provide a minimal reproducible example of the problem; otherwise, I can only speculate what is the exact problem that you are facing.
Your TCGA expression data was probably generated using GDC workflow for RNA-Seq. You should be able to generate the dataset on your own by following the workflow.
Thanks for your reply the analysis was for classification where sample grouped based on clinical information so any data I use for testing the model should have some clinical information.
regards
Realistically, it's going to be tough to find what you need. Since you're working with TCGA, I think you might want to try pan-cancer models first to increase the size of your dataset. Or you can really dig through GEO to find RNA-Seq data and accompanying clinical information in the published manuscript, assuming it is available.
Thanks I will try
Regards