Hey, I have RNA-seq data from patients who were infected with a virus . The transcriptomic data is collected from day 0 (pefore infection) till day 14, two weeks after infection. of course I have RNA-seq expression matrix with samples in columns and genes in rows. I also have the day of each sample, is it taken from day 0, day 1 and so on...
My goal is to use linear regression to predict the day, according to the data I have. To make a model that the independant variables are the genes, and the outcome (dependat variable) is the day number.
Is it possible to use linear regression for this task ? if yes, can I get a hint of how to do it in R ? if not, what should I use then ? it needs to be supervised machine learning..
Thank you.
You should consult a statistician about your data. No one ML model will fit in all cases, and there are certain precautions you need to take when building a predictive model.
This is a small sample of the RNA-seq counts data:
The metadata that tells us the day of the sample: