Your data looks like this:
df
Compound conc Count conc.pred
1 A 3.720e-03 10200000 565.1263
2 A 1.368e-02 10300000 559.3499
3 A 5.004e-02 10200000 565.1263
4 A 1.836e-01 10200000 565.1263
5 A 6.732e-01 9927000 580.8959
6 A 2.472e+00 9400000 611.3376
7 A 9.048e+00 8464040 665.4025
8 A 3.324e+01 6513840 778.0542
9 A 1.212e+02 4349320 903.0858
10 A 4.464e+02 3071480 976.8991
11 A 1.632e+03 5989320 808.3526
12 A 6.000e+03 8444560 666.5278
What are you aiming to model? You cannot do anything with the compound column because it only has one value.
The only thing that we can do here is build a linear regression model between conc
and Count
:
model <- lm(conc ~ Count, data = df)
summary(model)
Call:
lm(formula = conc ~ Count, data = df)
Residuals:
Min 1Q Median 3Q Max
-781.9 -620.7 -565.1 -552.1 5333.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.154e+03 1.836e+03 0.629 0.544
Count -5.776e-05 2.176e-04 -0.265 0.796
Residual standard error: 1816 on 10 degrees of freedom
Multiple R-squared: 0.006999, Adjusted R-squared: -0.0923
F-statistic: 0.07049 on 1 and 10 DF, p-value: 0.796
It looks like there is no statistically significant relationship between these.
We can assess R^2 shrinkage using 3-fold cross-validation:
library(bootstrap)
theta.fit <- function(x, y) { lsfit(x,y) }
theta.predict <- function(fit, x) { cbind(1,x) %*% fit$coef }
x <- as.matrix(df[,which(colnames(df) %in% names(summary(model)$coefficients[,1]))])
y <- as.matrix(df[,c("conc")])
results <- crossval(x, y, theta.fit, theta.predict, ngroup = 3)
# Raw R^2
cor(y, model$fitted.values)**2
[,1]
[1,] 0.006999459
# cross-validated R^2
cor(y, results$cv.fit)**2
[,1]
[1,] 0.07150013
It does not look like a good model, in fairness.
We can also plot the model-predicted conc
values versus the real ones:
require(ggplot2)
df$conc.pred = predict(model)
ggplot(df, aes(x = conc, y = conc.pred)) +
geom_point() +
geom_smooth(method='lm',formula=y~x) +
stat_smooth(method="lm", fullrange=TRUE) +
ggtitle("My linear model")
The confidence intervals (grey shade) are extraordinary.
Finally, we can plot the ROC curve:
require(pROC)
roc <- roc(df$conc, fitted(model), smooth=FALSE)
plot.roc(
roc,
grid=TRUE,
grid.lwd=2,
col="royalblue",
main="My linear model")
text(0.3, 0.45,
col="royalblue",
paste("AUC (L95, AUC, U95):",
paste(round(ci.auc(roc)[1:3], 3), sep=" ", collapse=","), sep=" "),
cex=1.0)
Hopefully this can 'foment' some ideas for you.
If you have other compounds in your data and want to model them as the y variables in the model, then you can switch to using a generalised linear model (glm()
) in place of the linear model (lm()
).
Kevin
@Kevin Blighe Thanks Kevin, I thought that the concentration-response curves is based on the percent-viability scores which is often fit using smooth cubic splines for multivariate data . what I am mainly looking for is to calculate the AUC which should be only a value, right?
I wrote and re-wrote the response to your comment a few times and decided on this: you should have added much more information to your original question (or perhaps I should have requested it). In my very brief example that I give (above), if the model assumptions were correct, then you would have a single value for AUC.
You are talking about multivariate data, but what do you mean? You just have 3 columns in your data, and the
Compound
column is just all 'A' values.@Kevin Blighe, So, let me give more information and see your comments, I will absolutely like and accept your answer even if it does not lead me to a correct answer. So, lets assume that I have a cell line which I treat with a compound in several concentration and I obtain the sensitivity values. If I have a replicate, should I treat them as a multivariate calibration or just do it over each replicate separately?
There is no right or wrong, I think. You can try them separate and then together (in a multivariate analysis). In the multivariate analysis, you could probably add
cell-line
as a covariate to see if there are cell-line specific effects in relation to drug sensitivity, too.@Kevin Blighe could you please use the following data in your example? I am just wondering how you could do that
Hey, can you not simply repeat my same code with your new data?
@Kevin Blighe do you know the cell to cell interaction
@Kevin Blighe you wrote me this
to see if there are cell-line specific effects in relation to drug sensitivity
actually I have been reading a lot about this. no one uses lm for such a analysis. the worse one can do is liner because basically it is not linear but nonlinear and sigmoidal ! however, thanks for all your time