I'm looking to train a machine learning model on Alzheimer's disease transcriptome versus non-AD transcriptome. I want to train the model to differentiate between the 2 so I can then learn which genes are most important for distinguishing. I also would prefer it to be only glial cell transcriptomes but this is only a preference.
I need a dataset with hundreds, if not thousands of samples. The datasets I have found on GEO are around <50 samples.
Does anyone have any advice on a way to do this with a small number (<50) samples, or know of any way I can get the amount of data I need? Alternatively, I can study another disease that would have more samples.
Thank you