Entering edit mode
3.7 years ago
camillab.
▴
160
Hi!
I am trying to perform linear regression for age and sex on a dataset with 6 samples and 16757 genes creating a loop. this is my dataset ( I copied the first columns):
'data.frame': 6 obs. of 16757 variables:
$ samples : chr "hu-c_lab13" "hu-c_lab15" "hu-c_lab17" "hu-gent_lab14" ...
$ treatment : chr "untreated" "untreated" "untreated" "treatment" ...
$ sex : chr "Male" "Female" "Male" "Female" ...
$ age : num 45 56 46 65 21 75
$ 7SK (i) : num 87779 79828 64005 44973 42646 ...
I want to do a loop to identify if age and sex affect the gene expression and I wanted to obtain the fitted.values
prova$treatment <- factor(prova$treatment, levels=c("treatment","untreated"))
prova$sex <- factor(prova$sex, levels=c("Female","Male"))
prova$age <- as.numeric(prova$age)
genelist <- prova %>% select(5:16757) #select genes
for (i in 1:length(genelist)) {
formula <- as.formula(paste("samples ~ ", genelist[i], " + age + sex ", sep=""))
model <- glm(formula, data = prova)
print(model[["fitted.values"]])
}
but it gives me
Error in y - mu : non-numeric argument to binary operator
what do I do wrong in the loop?
also if I do for single gene it works:
model2 <- lm(ENSG00000202198 ~ sex + age , data=prova)
summary(model2)
model$fitted.values <- predict(model2)
gene <- model2[["fitted.values"]]
gene <- as.data.frame(gene)
Thank you
Camilla
like this (apologise I am really bad with loops...)?
also I checked if there was any character in the genes am I interested to (
sapply(prova[5:16753], class)
) and there is no characters there. may be something to do with the fact that sex is a character?Other than non-numeric, you can also check for infinite. This loop should show the name of the last gene that caused the problem, which should allow you to debug the problem. (It is also possible that your
samples
column is non-numeric)Sam