ROC analysis, is this ROC curve usual?
0
1
Entering edit mode
19 months ago
seta ★ 1.9k

Dear all,

I used the pROC library in R for ROC analysis. My response variable is binary and my independent variable is categorical as 0, 1, 2. As the output of the analysis, I obtained the ROC curve figure, but it looks a bit strange, does not? Why the corresponding line did not start from 0, is it usual or something is wrong?

enter image description here

Thanks for sharing your comments!

AUC curve ROC • 1.2k views
ADD COMMENT
0
Entering edit mode

Do you have a low sample number?

ADD REPLY
0
Entering edit mode

Hi Kevin,

No, the sample size is about 12000, of which 35% of them are cases. Based on tutorials I've read, I used 80% of the samples for the training dataset and the rest of them for the test dataset. Also, the rate of the response variable is similar in both datasets. In fact, my independent variable is an SNP that I converted to 0, 1, 2 based on the number of effect allele, is it may be the issue or something else?

ADD REPLY
1
Entering edit mode

Oh, I see. We would sometimes see this plot if the dataset was very small (~3 samples), but your dataset is actually large but has some other issue - the fact that you have a binary outcome and an independent variables with just 3 levels is telling. Diagnosing the problem is difficult from here without seeing the input and output of every step. Can you share the code for the ROC curve and also the model fitting (glm())?

ADD REPLY
0
Entering edit mode

Here is the code I used:

library(pROC)
library(caret)

df <- read.csv("data.csv")
head(df)
   sample rs8176740 rs4962040 rs688976 rs529565 group
1 sample1         0         1        0        1     0
2 sample2         1         0        1        1     0
3 sample3         0         1        0        1     0
4 sample4         1         0        1        1     0
5 sample5         1         0        1        1     0
6 sample6         0         1        0        1     0

set.seed(132)
df <- df[sample(nrow(df)), ]

 train_idx <- createDataPartition(df$group, p = 0.8, list = FALSE, times = 1)
 train_data <- df[train_idx, ]
 test_data <- df[-train_idx, ]
 y_test <- df[-train_idx, "group"]
 prop.table(table(train_data$group))

        0         1 
0.6424394 0.3575606 
 prop.table(table(test_data$group))

        0         1 
0.6418721 0.3581279 

 model <- glm (group ~ rs4962040 , data= df[train_idx, ], family=binomial)
 y_pred <- predict(model, newdata=test_data, type="response")

 roc_data <- roc(y_test, y_pred)
Setting levels: control = 0, case = 1
Setting direction: controls < cases

 auc_score <- auc(roc_data)
 plot(roc_data, main=paste("ROC Curve (AUC = ", round(auc_score, 2), ")", sep=""))

Is there any problem?

Thanks

ADD REPLY
1
Entering edit mode

I don't immediately see anything wrong. The unusual curve is probably just due to the fact that everything is categorical and that there are only 5 levels across the outcome and independent variables in total.

ADD REPLY

Login before adding your answer.

Traffic: 1883 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6