Question

Two Sample t-test with bootstrapping for gene expression matrix in R

1

Entering edit mode

4.2 years ago

Expert ▴ 10

I tried bioconductor packages for differential gene expression analysis such as EdgeR, Deseq2, Limma and obtained expressed genes by these methods. I want to compare my results with Two Sample t-test with bootstrapping, but I can not understand this method very well. For example in my table, there are 5 control, 5 treatment sample (columns) with 1000 genes (rows). Can we find differentially expressed genes by applying bootstrap two sample t-test for each gene? I scanned literature, but I could not find good solutions for gene expression analysis. I tried below codes as t-test process with "boot" package:

library(boot) boot.tee <- function(data, i){  data <- as.matrix(data)   for (i in 1:data) {     t.test(sample(data[i,1:5], 5, replace=T ),sample(data[i,6:10], 5, replace=T), paired = FALSE)$p.value  } } boot.out <- boot(data=LogT_matrix, statistic=boot.tee , R=10)

then I recieved a warning message :

In 1:data : numerical expression has 10000 elements: only the first used

In this page http://ww2.coastal.edu/kingw/statistics/R-tutorials/resample.html, there are some examples, but I want to obtain p values for all genes in my table such as toptable, toptags tables in EdgeR, limma packages. I can obtain standart t-test for my data, but I could not use it for bootstraping t-test. Can Bootstrap Statistics be applied to each gene? Thank you.

RNA-Seq R • 1.4k views

ADD COMMENT • link updated 4.2 years ago by e.rempel ★ 1.1k • written 4.2 years ago by Expert ▴ 10

0

Entering edit mode

I scanned literature, but I could not find good solutions for gene expression analysis.

Of course not, t-tests are not well-suited for gene expression (or any high-throughput) assays with limited replicate numbers as 5 vs 5, this is why these expert softwares such as DESeq2 and edgeR have been developed. Don't reinvent the wheel / waste your time, use them rather than doing homebrew methods. If t-tests with permutation was a valid option then the field would be applying these very obvious options and statisticians would not have spent their time developing alternatives.

ADD REPLY • link 4.2 years ago by ATpoint 87k

0

Entering edit mode

Examples can be expanded. 10 controls, 30 treatments may also be available. I just gave a small example to run the code. Bootstrap t-test is a powerful test and the results obtained can be used for meta-analysis. The purpose of statistics is to develop appropriate methods.

ADD REPLY • link 4.2 years ago by Expert ▴ 10

0

Entering edit mode

The t-test for continuous data, and RNA-seq data is discrete. I still am not sure why you want to torture the data so much, when most available software will appropriatly model the count data using the negative binomial distribution.

ADD REPLY • link 4.2 years ago by rpolicastro 13k

score 1 · Answer 1 · 2021-02-23

Can Bootstrap Statistics be applied to each gene?

I think it can. I can imagine there is an error in your code. The error message implies that you should probably replace the expression

1:data

with

1:nrow(data)

since i is an iterator for rows(genes), right?

The point mentioned by ATpoint is an interesting one. You have to find a balance: use already existing/accepted code, but also try things out, compare them with existing tools, make your hands dirty.