chi squared assesment of gene expression and batch effect
2
0
Entering edit mode
3.8 years ago
RNAseqer ▴ 280

I have been asked to do an analysis of changes in 2000 genes' expression across 300 individuals in response to a batch variable with five levels using chi-squared analysis, specifically in order to extract t-statistics from the results. However, I am just not sure where to start. Is there a way to perform such an analysis in EdgeR? I'd be very grateful for any assistance.

RNA-Seq r chi squared • 1.6k views
ADD COMMENT
1
Entering edit mode
3.8 years ago
Gordon Smyth ★ 7.7k

You can't extract t-statisics from edgeR and there's no need to do so anyway. To test for differences between the batch levels, just conduct a glmQLFTest with coef= to all the batch coefficients. That will do an F-test. No need for contrasts at all.

The (so-called) chi-squared test is something different. I assume your statistical adviser might perhaps be thinking of this as a way to check whether the batches are merely technical replicates. If so, there is a better way to do that test in edgeR. Just include sample ID as the factor in the linear model. Then the between-batch within-sample variability should be small and the edgeR dispersion parameter should be close to zero.

Whether or not you plan to do a formal test, you should use plotMDS to explore graphically whether the batches appear to be different.

ADD COMMENT
0
Entering edit mode

Thank you for the suggstion! If I understand you correctly , and I understand the edger manual's section on glmQLFTest(), I can use the output glmQLFTest() to identify genes significantly affected by each individual batch?

mod <- model.matrix(~ factDx + LibraryBatchall)
dge.estDisp.mod <- estimateDisp(dge, mod)
fit <- glmQLFit(dge.estDisp.mod, mod)
qlf.specific.batch <- glmQLFTest(fit, coef="specific.batchID")
topTags(qlf.specific.batch )

Absolutely will add the plotMDS graphical exploration to my to-do list as well.

ADD REPLY
0
Entering edit mode
3.8 years ago
halo22 ▴ 300

Are you sure if chi-squared is the correct test? I am not sure if you can do a chi-sq in EdgeR. If the goal is to account for the changes in gene expression across different levels, why not use EdgeR to do a differential expression analysis?

ADD COMMENT
0
Entering edit mode

Well someone with a far superior knowledge of statistical techniques asked me to do it, and I'll certainly be asking them the same question later. But for right now I'd just like to perform the test and have results to discuss. The tetrachoric comparison was also suggested as a possibility, but I'm not familiar with it.

In terms of running a regular regression onto batch with five levels, I'm not sure how to set up a contrast matrix that would allow me to test for DE between the five levels. I know how to set up the contrast to look for DE between two levels of batch, but I want to compare all levels and am unsure how to set that up (actually if you have an example of that it would be extremely helpful). The real interest is in getting the t-statistics.

ADD REPLY
0
Entering edit mode

You can use R to run a chi-square, I am just not sure if the underlying data/gene distribution is suitable for this test. Check this document https://rstudio-pubs-static.s3.amazonaws.com/79395_b07ae39ce8124a5c873bd46d6075c137.html to see how edgeR is used for multi-groups. PS: I am not a statistician

ADD REPLY

Login before adding your answer.

Traffic: 1891 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6