Entering edit mode
14 months ago
sil_bioinfo
▴
50
Hello,
I would like to create a machine learning model where I could test different biomarkers to detect a certain disease. I have some data. Where should I start? Any help or advice is welcome.
Thank you in advance
You should start by defining your input and expected output as clearly as you can. 'I have "some data", I need a way to wrangle it till it does what I want it to do' is not a great starting point.
The data I have to use is: https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-7830?query=E-MTAB-7830 The idea is, from transcriptomics data of TB patients, healthy patients and patients with latent TB, to know which is the best combination of biomarkers to differentiate patients with latent TB from patients with active TB. I have no idea where I should start..
I don't know much about machine learning but your problem definition seems to be good. I think it should be possible to train a model to use data of known TB-based classes of patients to then classify patients with unknown TB-based class. I'm not sure if you wish to focus on that (classification) ultimately or on selecting features most predictive of each class (the "combination of biomarkers" that you refer to), but I think both can be done. I'd consult an expert on Machine Learning once you have the per-class datasets and patient-labels ready.