Rstudio hangs on logisic regression using generalized estimating equations (geepack::geese())
0
0
Entering edit mode
5 months ago

Hi biostars,

I hope someone can help me to figure out this very weird problem. I can't figure out why this happens.

I have run logistic regression using:

model <- geese(
                     formula = dm_ever ~ AGE + AGE.2 + SEX + BMI + X10.68298228,
                     data = df_temp,
                     id = PAT,
                     corstr = "exchangeable",
                     family = binomial()
                )

I have successfully run it on thousands of SNPs (X10.68298228 is a SNP) without any problem, but for a few SNPs Rstudio just hangs and get unresponsive until I have to terminate R and restart it, no errors, no nothing. The only change between tests is the SNP columns, dm_ever, AGE, AGE.2, SEX and BMI doesn't change.

The SNP in the code is an example of what happens. I attach test_hangup.Rdata for you to run and try.

test_hangup.Rdata

I would truly appreciate if someone could help me solve this issue!

best
Jonas

geepack • 1.6k views
ADD COMMENT
0
Entering edit mode

additional info:

I figured out that if I remove line 3203 and 3212, the model runs wihout any problem. These are the two lines: enter image description here

I don't know if that helps, nevertheless I find it very strange.

ADD REPLY
0
Entering edit mode

Please do not ask people to access random files stored on Google drive. Use reproducible examples/reprexes instead.

ADD REPLY
0
Entering edit mode

I have no idea how to do that since the program crash specifically on this dataset.:) any advice? I can't insert the whole dataset of 3212 rows.

ADD REPLY
0
Entering edit mode

I'd say include those 2 problematic rows (kudos for zeroing in on them BTW, that was a huge step in the right direction) with maybe 20 other rows and share just that subset. You don't need us to give you the exact answer, just figure out why those two rows are making it crash, right?

ADD REPLY
0
Entering edit mode

I have investigated some more and it just get weirder and weirder. geese() hangs when I run with 3212 rows of data, but apparently if I exlude any 2 rows in the data set it suddenly works. It doesn't seem to have to do with these exact rows of data, neither that the data set is to large since I have run many others that are bigger. I have no idea what the heck is going on, must be some weird bug? I have tried to only load library(geepack) to rule out any weird conflict with other packages but no difference.

ADD REPLY
0
Entering edit mode

How much RAM does the machine you're using have?

ADD REPLY
0
Entering edit mode

I have 32 GB on my local machine but I also run it on a HPC where I have some 100 GBs if necessary. It doesn't seem to be the RAM issue either. The function immediately hangs when I run it, if I remove any two rows it takes just a few seconds for it to complete.

ADD REPLY
0
Entering edit mode

That is really odd. I'll see if I can reproduce this with the Rdata file you uploaded - you were right and I was wrong, we do need your whole file to help you with this issue.

ADD REPLY
0
Entering edit mode

np:), really appreciate your help! have you been able to figure something out?

ADD REPLY
0
Entering edit mode

Sorry, I haven't had a chance to look yet - I've been busy. I'll take a look today.

ADD REPLY
0
Entering edit mode

On my machine, it ran to completion/error in <1 second:

ls()
[1] "df_temp"

dim(df_temp)
[1] 3212    7

geepack::geese(formula = dm_ever ~ AGE + AGE.2 + SEX + BMI + X10.68298228,
                     data = df_temp,
                     id = PAT,
                     corstr = "exchangeable",
                     family = binomial()
                )

Call:
geepack::geese(formula = dm_ever ~ AGE + AGE.2 + SEX + BMI +
    X10.68298228, id = PAT, data = df_temp, family = binomial(),
    corstr = "exchangeable")

Mean Model:
 Mean Link:                 logit
 Variance to Mean Relation: binomial

 Coefficients:
  (Intercept)           AGE         AGE.2           SEX           BMI X10.682982281
   -3.388e+00     4.402e-03    -2.915e-06    -2.557e-01     8.708e-02    -4.504e+15

Scale Model:
 Scale Link:                identity

 Estimated Scale Parameters:
(Intercept)
     0.9854

Correlation Model:
 Correlation Structure:     exchangeable
 Correlation Link:          identity

 Estimated Correlation Parameters:
 alpha
0.2561

Returned Error Value:  1
Number of clusters:   1709   Maximum cluster size: 13

My sessionInfo():

R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
 [1] backports_1.4.1  tidyr_1.3.1      utf8_1.2.4       R6_2.5.1         tidyselect_1.2.1 magrittr_2.0.3   glue_1.7.0       tibble_3.2.1     pkgconfig_2.0.3  dplyr_1.1.4      generics_0.1.3   lifecycle_1.0.4  cli_3.6.2
[14] fansi_1.0.6      vctrs_0.6.5      compiler_4.3.2   purrr_1.0.2      tools_4.3.2      broom_1.0.5      pillar_1.9.0     geepack_1.3.11   rlang_1.1.3      MASS_7.3-60.0.1
ADD REPLY
0
Entering edit mode

do you mean it errored with all datapoints and ran to completion when you removed any? or what do you mean with completion/error? sorry if I'm obtuse here:)

ADD REPLY
0
Entering edit mode

I ran the exact code you provided on the exact Rdata file you provided, as you can see from the dim output. It ran to completion.

ADD REPLY
0
Entering edit mode

aha, but why does it say returned error value: 1? I have tried it on 3 different machines now and it hangs on all, might have to do with versions or something?

ADD REPLY
0
Entering edit mode

why does it say returned error value: 1?

I don't know but it does not hang on my machine. Please investigate what this "error value" could mean.

might have to do with versions or something?

This is why I provided my sessionInfo(), so you can compare versions.

ADD REPLY
0
Entering edit mode

Yes thx for that, Im comparing and updating my packages one at a time to see if there is any luck:)

ADD REPLY
0
Entering edit mode

Doesn't seem to work:( do you think you could create a renv.lock file from your working session that i could try?

ADD REPLY
0
Entering edit mode

Did you compare R versions? I'm not comfortable sharing my environment, plus I'd need to recreate it.

Also, I use R --vanilla when I tested your dataset and code if that helps.

ADD REPLY
0
Entering edit mode

ah, I see, no worries! It's really weird, I tried exactly the same with all version you ran (as far as I can tell) but it still didn't work.

ADD REPLY

Login before adding your answer.

Traffic: 1588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6