Hello,
I am not good with loops in R and have a challenging data to subset.
Dimensions of the dataframe is 17 x 18000.
The value of first 200 columns are categorical binary and the rest of the columns have positive numerical values.
Representative dataframe is below;
View(df)
Drug_1 Drug_2 . . . Drug_200 Gene_1 Gene_2 . . . Gene_17800
Cell_1 1 1 . . . 1 3.410109 2.698543 . . . 2.991730
Cell_2 0 1 . . . 1 6.190569 2.785505 . . . 2.893962
Cell_3 1 1 . . . 0 5.503953 2.614325 . . . 2.787185
Cell_4 1 1 . . . 1 3.314800 2.685167 . . . 3.746460
Cell_5 0 1 . . . 1 3.702378 2.663557 . . . 5.541395
Cell_6 1 1 . . . 1 6.623338 2.623761 . . . 2.892601
Cell_7 0 0 . . . 1 3.855267 2.685530 . . . 2.879253
Cell_8 1 1 . . . 1 3.813186 2.741521 . . . 7.204914
Cell_9 1 1 . . . 0 4.010305 2.619892 . . . 2.930020
Cell_10 0 1 . . . 1 3.769854 2.831024 . . . 4.495060
Cell_11 0 1 . . . 0 4.325175 2.795230 . . . 3.181098
Cell_12 1 1 . . . 1 5.502184 2.691975 . . . 2.928878
Cell_13 1 0 . . . 1 5.711048 2.649376 . . . 2.897740
Cell_14 1 1 . . . 1 3.990681 2.719580 . . . 2.934628
Cell_15 1 0 . . . 1 5.650302 2.843495 . . . 3.025947
Cell_16 1 1 . . . 1 3.250378 2.498467 . . . 6.397197
Cell_17 1 1 . . . 1 5.366431 2.853150 . . . 5.033118
I want to explain the drug responses of cells (1 or 0) for a drug with their respective gene expression levels (high or low) via logistic regression models. However, as a first step I have to select features (genes in my case). The structure of my case is quite complex for implementing common feature selection approaches.
To manually pick contrast response inducing features for each of 200 drug, I planned to form a nested loop for drugs and subset the genes for each drug which are differentially expressed compared to opposite response giving cells.
To illustrate; I want to subset the genes which have different values (higher or lower) in 0 response giving cells compared to 1 response giving cells. And aiming to do this for all 200 drugs in a loop.
I hope I could explain my problem clearly. Can you help me to establish a working loop, please ?
Thanks in advanced.
Thanks a lot for your suggestion.
I followed your guidance and run the modified loop which lasted around 8 hours of computation on i7 processor 16 Gb ram.
I interpret the output matrix which has values in the range of [-1,1], if the value is closer to 1 that means it effects the response positively (I presume those are beta values) if it is closer to -1 vice versa; or should I treat these values like p-values ?
Read about lm() output, here:
As far as I can see, this post answers your original question - "how to do it", and above link should be enough to answer "how to interpret the results".