I tried bioconductor packages for differential gene expression analysis such as EdgeR, Deseq2, Limma and obtained expressed genes by these methods. I want to compare my results with Two Sample t-test with bootstrapping, but I can not understand this method very well. For example in my table, there are 5 control, 5 treatment sample (columns) with 1000 genes (rows). Can we find differentially expressed genes by applying bootstrap two sample t-test for each gene? I scanned literature, but I could not find good solutions for gene expression analysis. I tried below codes as t-test process with "boot" package:
library(boot) boot.tee <- function(data, i){ data <- as.matrix(data) for (i in 1:data) { t.test(sample(data[i,1:5], 5, replace=T ),sample(data[i,6:10], 5, replace=T), paired = FALSE)$p.value } } boot.out <- boot(data=LogT_matrix, statistic=boot.tee , R=10)
then I recieved a warning message :
In 1:data : numerical expression has 10000 elements: only the first used
In this page http://ww2.coastal.edu/kingw/statistics/R-tutorials/resample.html, there are some examples, but I want to obtain p values for all genes in my table such as toptable, toptags tables in EdgeR, limma packages. I can obtain standart t-test for my data, but I could not use it for bootstraping t-test. Can Bootstrap Statistics be applied to each gene? Thank you.
Of course not, t-tests are not well-suited for gene expression (or any high-throughput) assays with limited replicate numbers as 5 vs 5, this is why these expert softwares such as DESeq2 and edgeR have been developed. Don't reinvent the wheel / waste your time, use them rather than doing homebrew methods. If t-tests with permutation was a valid option then the field would be applying these very obvious options and statisticians would not have spent their time developing alternatives.
Examples can be expanded. 10 controls, 30 treatments may also be available. I just gave a small example to run the code. Bootstrap t-test is a powerful test and the results obtained can be used for meta-analysis. The purpose of statistics is to develop appropriate methods.
The t-test for continuous data, and RNA-seq data is discrete. I still am not sure why you want to torture the data so much, when most available software will appropriatly model the count data using the negative binomial distribution.