Hi biostars,
I hope someone can help me to figure out this very weird problem. I can't figure out why this happens.
I have run logistic regression using:
model <- geese(
formula = dm_ever ~ AGE + AGE.2 + SEX + BMI + X10.68298228,
data = df_temp,
id = PAT,
corstr = "exchangeable",
family = binomial()
)
I have successfully run it on thousands of SNPs (X10.68298228 is a SNP) without any problem, but for a few SNPs Rstudio just hangs and get unresponsive until I have to terminate R and restart it, no errors, no nothing. The only change between tests is the SNP columns, dm_ever, AGE, AGE.2, SEX and BMI doesn't change.
The SNP in the code is an example of what happens. I attach test_hangup.Rdata for you to run and try.
I would truly appreciate if someone could help me solve this issue!
best
Jonas
additional info:
I figured out that if I remove line 3203 and 3212, the model runs wihout any problem. These are the two lines:
I don't know if that helps, nevertheless I find it very strange.
Please do not ask people to access random files stored on Google drive. Use reproducible examples/reprexes instead.
I have no idea how to do that since the program crash specifically on this dataset.:) any advice? I can't insert the whole dataset of 3212 rows.
I'd say include those 2 problematic rows (kudos for zeroing in on them BTW, that was a huge step in the right direction) with maybe 20 other rows and share just that subset. You don't need us to give you the exact answer, just figure out why those two rows are making it crash, right?
I have investigated some more and it just get weirder and weirder. geese() hangs when I run with 3212 rows of data, but apparently if I exlude any 2 rows in the data set it suddenly works. It doesn't seem to have to do with these exact rows of data, neither that the data set is to large since I have run many others that are bigger. I have no idea what the heck is going on, must be some weird bug? I have tried to only load library(geepack) to rule out any weird conflict with other packages but no difference.
How much RAM does the machine you're using have?
I have 32 GB on my local machine but I also run it on a HPC where I have some 100 GBs if necessary. It doesn't seem to be the RAM issue either. The function immediately hangs when I run it, if I remove any two rows it takes just a few seconds for it to complete.
That is really odd. I'll see if I can reproduce this with the Rdata file you uploaded - you were right and I was wrong, we do need your whole file to help you with this issue.
np:), really appreciate your help! have you been able to figure something out?
Sorry, I haven't had a chance to look yet - I've been busy. I'll take a look today.
On my machine, it ran to completion/error in <1 second:
My sessionInfo():
do you mean it errored with all datapoints and ran to completion when you removed any? or what do you mean with completion/error? sorry if I'm obtuse here:)
I ran the exact code you provided on the exact Rdata file you provided, as you can see from the
dim
output. It ran to completion.aha, but why does it say returned error value: 1? I have tried it on 3 different machines now and it hangs on all, might have to do with versions or something?
I don't know but it does not hang on my machine. Please investigate what this "error value" could mean.
This is why I provided my
sessionInfo()
, so you can compare versions.Yes thx for that, Im comparing and updating my packages one at a time to see if there is any luck:)
Doesn't seem to work:( do you think you could create a renv.lock file from your working session that i could try?
Did you compare R versions? I'm not comfortable sharing my environment, plus I'd need to recreate it.
Also, I use
R --vanilla
when I tested your dataset and code if that helps.ah, I see, no worries! It's really weird, I tried exactly the same with all version you ran (as far as I can tell) but it still didn't work.