Dear Collegues,
I am trying to perform analysis of differentially expressed genes between multiple groups. For that I found some tutorial (https://jef.works/blog/2017/05/31/multiclass-diffential-expression-analysis/). I am interested in t-test only between one group vs rest. This is corresponding code part:
pv.sig <- lapply(levels(groups), function(g){
ingroup <- names(groups)[groups %in% g]
outgroup <- names(groups)[!(groups %in% g)]
pv <- sapply(1:M, function(i) {
t.test(x[i,ingroup], x[i,outgroup])$p.value
})
names(pv) <- rownames(x)
pv.sig <- names(pv)[pv < 0.05/M/G] ## bonferonni
pv.sig
})
When I run it on data created in the tutorial everything is fine. When I run it on my data each time I get such error:
Error in 'x[i, ingroup]': subscript out of bounds
This is how my data looks like: my data
My example file may be downloaded here: example data
This is how my code looks like:
x = read.table("~/Desktop/DAT.txt", sep = "\t", dec = ".", header = T)
groups = c(rep("B", 505), rep("C", 304), rep("O", 301), rep("U", 370))
groups = as.factor(groups)
rownames(x) = x$X
x$X = NULL
x = as.matrix(x)
names(groups) <- colnames(x)
M = 1480
G = 4
pv.sig <- lapply(levels(groups), function(g){
ingroup <- names(groups)[groups %in% g]
outgroup <- names(groups)[!(groups %in% g)]
pv <- sapply(1:M, function(i) {
t.test(x[i,ingroup], x[i,outgroup])$p.value
})
names(pv) <- rownames(x)
pv.sig <- names(pv)[pv < 0.05/M/G] ## bonferonni
pv.sig
})
I've already tested many different combinations, but I am too weak in writing functions to solve what is wrong. So, I ask you as pros for help and I'll appreciate it much.
Best regards!
Is this RNA-seq? If so, I'd highly recommend using an established method based on negative binomial distributions like DESeq2 or edgeR rather than trying to use a homebrew method. They will be more accurate and sensitive, plus they handle outliers in sensible ways and are flexible enough to do any sort of comparison you'd want.
They have very good documentation and are generally easy to use as well.
Indeed, this is RNA-seq. Thank you for your suggestion, I think I'll do that as the method you mentioned are broadly recognized.
Nevertheless, as my abilities to write functions are very poor and I am trying to learn something, I wish I could solve the presented issue anyway. Just for my knowledge.
I don't think your
ingroup
(oroutgroup
) variables will end up with anything in them as they currently are. Generally, it is often a good idea to run your logic outside of a function first (or abuse print statements) to be sure you're getting what you expect at each step.