Code for multiclass differential gene expression analysis
0
0
Entering edit mode
4.8 years ago
orzech_mag ▴ 230

Dear Collegues,

I am trying to perform analysis of differentially expressed genes between multiple groups. For that I found some tutorial (https://jef.works/blog/2017/05/31/multiclass-diffential-expression-analysis/). I am interested in t-test only between one group vs rest. This is corresponding code part:

pv.sig <- lapply(levels(groups), function(g){
  ingroup <- names(groups)[groups %in% g]
  outgroup <- names(groups)[!(groups %in% g)]
  pv <- sapply(1:M, function(i) {
    t.test(x[i,ingroup], x[i,outgroup])$p.value
  })
  names(pv) <- rownames(x)
  pv.sig <- names(pv)[pv < 0.05/M/G] ## bonferonni
  pv.sig
})

When I run it on data created in the tutorial everything is fine. When I run it on my data each time I get such error:

Error in 'x[i, ingroup]': subscript out of bounds

This is how my data looks like: my data

My example file may be downloaded here: example data

This is how my code looks like:

x = read.table("~/Desktop/DAT.txt", sep = "\t", dec = ".", header = T)
groups = c(rep("B", 505), rep("C", 304), rep("O", 301), rep("U", 370))
groups = as.factor(groups)

rownames(x) = x$X
x$X = NULL
x = as.matrix(x)
names(groups) <- colnames(x)

M = 1480
G = 4
pv.sig <- lapply(levels(groups), function(g){
  ingroup <- names(groups)[groups %in% g]
  outgroup <- names(groups)[!(groups %in% g)]
  pv <- sapply(1:M, function(i) {
    t.test(x[i,ingroup], x[i,outgroup])$p.value
  })
  names(pv) <- rownames(x)
  pv.sig <- names(pv)[pv < 0.05/M/G] ## bonferonni
  pv.sig
})

I've already tested many different combinations, but I am too weak in writing functions to solve what is wrong. So, I ask you as pros for help and I'll appreciate it much.

Best regards!

deg ttest function • 1.6k views
ADD COMMENT
0
Entering edit mode

Is this RNA-seq? If so, I'd highly recommend using an established method based on negative binomial distributions like DESeq2 or edgeR rather than trying to use a homebrew method. They will be more accurate and sensitive, plus they handle outliers in sensible ways and are flexible enough to do any sort of comparison you'd want.

They have very good documentation and are generally easy to use as well.

ADD REPLY
0
Entering edit mode

Indeed, this is RNA-seq. Thank you for your suggestion, I think I'll do that as the method you mentioned are broadly recognized.

Nevertheless, as my abilities to write functions are very poor and I am trying to learn something, I wish I could solve the presented issue anyway. Just for my knowledge.

ADD REPLY
0
Entering edit mode

I don't think your ingroup (or outgroup) variables will end up with anything in them as they currently are. Generally, it is often a good idea to run your logic outside of a function first (or abuse print statements) to be sure you're getting what you expect at each step.

ADD REPLY

Login before adding your answer.

Traffic: 2565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6