Question

t-test in two groups, multiple rows

0

Entering edit mode

8 months ago

sooni ▴ 20

Hello.

My data frame has different rows of bacterial genes and a total of 6 columns, 3 for the control group and 3 for the experimental group. I want to do a t-test between the control group and the experimental group, and for each row, i.e. for each bacteria, I want to get the p-value between the two groups.

Here's the R code I used to do this:

col_t_test <- function(col) {
  WT <- col(kegg_counts[1:3])    
  PK <- col(kegg_counts[4:6])  

  t_test_result <- t.test(WT, PK)
  return(c(t_test_result$estimate, t_test_result$p.value))
}

results <- t(apply(kegg_counts, 1, col_t_test))

If you run the above code, all result values will be the same. Something seems wrong. Is there a good way?

Thank you for help!

R t-test • 1.2k views

ADD COMMENT • link 8 months ago by sooni ▴ 20

0

Entering edit mode

A solution would be to use t_test function from the rstatixpackage, it provides an easy solution to your problem.

ADD REPLY • link 8 months ago by DBScan ▴ 450

0

Entering edit mode

A t-test isn't appropriate for count data.

ADD REPLY • link 8 months ago by Michael 55k

0

Entering edit mode

I understand that you have count data (interger) with 3 replicates per group. It is important to understand where these counts are coming from to devise a good testing strategy. Please note that the accepted answer in this case would be incorrect.

ADD REPLY • link 8 months ago by Michael 55k

1

Entering edit mode

8 months ago

dariober 15k

Perhaps the machinery for differential gene expression analysis (i.e. limma, edger, deseq) is what you are looking for. Regarding your code, you pass col as an argument but you use the col function on (possibly) a vector. Maybe you wanted something like this:

col_t_test <- function(col) {
  WT <- col[1:3]
  PK <- col[4:6]

  t_test_result <- t.test(WT, PK)
  return(c(t_test_result$estimate, t_test_result$p.value))
}

Also, you probably want to apply some sort of multiple testing correction to the resulting p-values.

Finally, nit-picking:

I want to do a t-test between the control group and the experimental group, and ... I want to get the p-value between the two groups.

I think it is better to think in terms of what you want to estimate and only then choose an appropriate statistics. In your case, you (probably) want to estimate the difference between groups and assess to what extent that difference is compatible with the hypothesis of no difference. For this a t-test seems reasonable but there may be better options.

ADD COMMENT • link 8 months ago by dariober 15k

score 2 · Accepted Answer · 2024-03-15

2

Entering edit mode

8 months ago

ATpoint 85k

Simplest case I can think of:

ncol <- 6
nrow <- 10000

m <- matrix(data = rnorm(ncol*nrow), nrow = nrow, ncol = ncol)

res <- lapply(1:nrow(m), function(i){

  a <- m[i, 1:3, drop = TRUE]
  b <- m[i, 4:6, drop = TRUE]
  tt <- t.test(a, b)
  d <- data.frame(pvalue = tt$p.value, t = tt$statistic)
  return(d)

})

do.call(rbind, res)

Not efficient, but still runs in < 1 second on 10000 rows, so good enough without any fancy packages.

ADD COMMENT • link 8 months ago by ATpoint 85k

0

Entering edit mode

The following error occurs:

Error in var(x) : is.atomic(x) is not TRUE
In addition: Warning message:
In mean.default(x) : Returns NA because the argument is not a numeric or logical type.

First of all, my original data frames are all in numeric form.

ADD REPLY • link 8 months ago by sooni ▴ 20

0

Entering edit mode

Posted code works for numeric matrix. Sanitize your data.

ADD REPLY • link 8 months ago by ATpoint 85k

0

Entering edit mode

Agree on @dariober, make sure existing expert software cannot do this much better. (limma)

ADD REPLY • link 8 months ago by ATpoint 85k

0

Entering edit mode

I think the OP has count data from bacterial gene counts annotated KEGG categories, as I infer from the orignal post. Student's T-test is not appropriate to analyze these data. It will of course deliver a p-value, but a meaningless one. Therefore, I think this solution is not correct.