Hi, I'm a beginner in Microarray data analysis and I'm trying to identify differentially expressed genes in the GEO dataset
GSE131744
. This study aims at investigating the gene expression profiles in two groups of cells - TMZ sensitive and TMZ resistant and has been done on Illumina HumanHT-12 V4.0
platform. I would like to have log2FC along with p-values of all genes considering the TMZ sensitive group as reference. limma seems to be the popular method of choice for differential expression analysis as indicated in many posts. As suggested in Illumina HumanHT-12 V4.0 expression beadchip and https://support.bioconductor.org/p/92834/ I performed my analysis in limma using the non-normalized data (GSE131744_non-normalized.txt).
Here's my code:
library(limma)
x <- read.ilmn(files = "GSE131744_non-normalized.txt",
expr = "SAMPLE ",
probeid = "ID_REF")
expr <- x$E
expr <- na.omit(expr)
pval <- x@.Data[[3]]$Detection
pval <- na.omit(pval)
y <- neqc(x = expr,
detection.p = pval)
Group <- c("Sensitive", "Resistant")
Group <- factor(Group)
design <- model.matrix(~Group)
fit <- lmFit(object = y,
design = design)
fit <- eBayes(fit = fit,
trend = TRUE,
robust = TRUE)
The eBayes function however throws this error:
Error in .ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, : No finite residual standard deviations
I'm not able to figure out what this error means and how to fix it. Please help.
You cannot analyse data without any replication. That is what the error tells you.
Does it mean it's impossible to calculate log2FC without having replicates in each condition? Is there any other method to do the same?
Unreplicated experiments are not suitable to make serious statements on the effects of treatment. I would not even consider this dataset. Either find a suitable replicated published one, or create yourself.
you can directly do it here isn't it?
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE131744 using GEO2R
I tried GEO2R but it gives me only the log-transformed expression values in each condition. I'm looking for the log2FC and p-value for each gene. Is there a way to calculate these from the log-transformed values?
not without replication, i am not saying this to mock you. find a better dataset and move on
i saw the samples there are no replicates , so better no to ahead with this type of data