Two Sample t-test with bootstrapping for gene expression matrix in R
1
1
Entering edit mode
3.8 years ago
Expert ▴ 10

I tried bioconductor packages for differential gene expression analysis such as EdgeR, Deseq2, Limma and obtained expressed genes by these methods. I want to compare my results with Two Sample t-test with bootstrapping, but I can not understand this method very well. For example in my table, there are 5 control, 5 treatment sample (columns) with 1000 genes (rows). Can we find differentially expressed genes by applying bootstrap two sample t-test for each gene? I scanned literature, but I could not find good solutions for gene expression analysis. I tried below codes as t-test process with "boot" package:

library(boot) boot.tee <- function(data, i){  data <- as.matrix(data)   for (i in 1:data) {     t.test(sample(data[i,1:5], 5, replace=T ),sample(data[i,6:10], 5, replace=T), paired = FALSE)$p.value  } } boot.out <- boot(data=LogT_matrix, statistic=boot.tee , R=10)

then I recieved a warning message :

In 1:data : numerical expression has 10000 elements: only the first used

In this page http://ww2.coastal.edu/kingw/statistics/R-tutorials/resample.html, there are some examples, but I want to obtain p values for all genes in my table such as toptable, toptags tables in EdgeR, limma packages. I can obtain standart t-test for my data, but I could not use it for bootstraping t-test. Can Bootstrap Statistics be applied to each gene? Thank you.

RNA-Seq R • 1.2k views
ADD COMMENT
0
Entering edit mode

I scanned literature, but I could not find good solutions for gene expression analysis.

Of course not, t-tests are not well-suited for gene expression (or any high-throughput) assays with limited replicate numbers as 5 vs 5, this is why these expert softwares such as DESeq2 and edgeR have been developed. Don't reinvent the wheel / waste your time, use them rather than doing homebrew methods. If t-tests with permutation was a valid option then the field would be applying these very obvious options and statisticians would not have spent their time developing alternatives.

ADD REPLY
0
Entering edit mode

Examples can be expanded. 10 controls, 30 treatments may also be available. I just gave a small example to run the code. Bootstrap t-test is a powerful test and the results obtained can be used for meta-analysis. The purpose of statistics is to develop appropriate methods.

ADD REPLY
0
Entering edit mode

The t-test for continuous data, and RNA-seq data is discrete. I still am not sure why you want to torture the data so much, when most available software will appropriatly model the count data using the negative binomial distribution.

ADD REPLY
1
Entering edit mode
3.8 years ago
e.rempel ★ 1.1k

Can Bootstrap Statistics be applied to each gene?

I think it can. I can imagine there is an error in your code. The error message implies that you should probably replace the expression

1:data

with

1:nrow(data)

since i is an iterator for rows(genes), right?

The point mentioned by ATpoint is an interesting one. You have to find a balance: use already existing/accepted code, but also try things out, compare them with existing tools, make your hands dirty.

ADD COMMENT

Login before adding your answer.

Traffic: 1544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6