Entering edit mode
14 months ago
Manuel Sokolov Ravasqueira
▴
110
I did the limma voom pipeline to normalize the expression using quantile normalization. I want to discretize the expression of each gene into "high" or "low".
This is my current approach:
table <- data.frame(row.names = names, pvalue = numeric(length(names)))
# Loop through the genes
for (i in seq_along(names)) {
normalized_expression_new <- normalized_expression
gene <- names[i]
gene_exp <- normalized_expression_new[,c(gene)]
low <- mean(gene_exp) - 1 * sd(gene_exp)
high <- mean(gene_exp) + 1 * sd(gene_exp)
normalized_expression_new$DISCRETE_GENE <- ifelse(normalized_expression[,c(gene)] <= low, "low",
ifelse(normalized_expression[,c(gene)] >= high, "high", "mid")
)
formula_string <- paste("Surv(TIME_TO_DEATH, MORTE) ~ DISCRETE_GENE")
res <- survfit(formula = as.formula(formula_string), data = normalized_expression_new)
table$pvalue[i] <- as.numeric(gsub("[^0-9.]", "", surv_pvalue(res)$pval.txt))
}
surv_pvalue(res)$pval.txt
table$adjusted_pvalue <- p.adjust(table$pvalue, method = "BH")
So, for each gene I get the mean and stardant deviaton and convert so that low is mean - sd and high mean + sd, then get the p value for each keplen meyer curve and finally adjust the p values. Is there any better way to do this, given the previous normalization?
Best regards,
Manuel