Hello, I am currently reading a paper which uses both germline and somatic mutations to predict the drug responds of a certain cancer cell line.
For that they use multivariant linear regression.
However, right now I have a really hard time picturing how the training data looks like. The variable they want to predict is the drug responds (1-AUC), which should be a 1D array with values between 0 and 1.
However, how does the input array look like? For a given cell line, is it also just a 1D array (in there case with 735 rows) with a 1 if the mutation occurred at that position or a 0 if that mutation was not observed in the cell line?
Honestly it feels kind of weird to use this kind of binary input data (1-> mutated site, 0 -> site not mutated) to predict a continuous variable. Is there something I miss?
Any help is much appreciated!
Cheers.
I think you are saying the QTL mapping, it's quite commonly seen that using categorical variables in a linear (mixed) model. I'd suggest doing some Google search for these terms first... e.g.: