Question

Heirarchal regression for model comparison, RNA-seq

0

Entering edit mode

3.7 years ago

RNAseqer ▴ 270

Does anyone have experience doing hierarchical regression with an RNA-seq dataset? I was hoping to add a series of dummy variables encoding the five levels of my batch effect to determine which batch if any is having a significant effect

mod.b2 <- model.matrix(~ factDx + batch2)
mod.b2b5 <- model.matrix(~ factDx + batch2 + batch5)
mod.b2b5b4 <- model.matrix(~ factDx + batch2 + batch5 + batch4)
mod.b2b5b4b1 <- model.matrix(~ factDx + batch2 + batch5 + batch4 +batch1)
Etc...

Its pretty easy to set up the models but I am unsure of how to set up/compare the Rsquared values of those models for each individual gene. My thinking was that I could create a list of Rsquared values and then look at the change in Rsquared resulting from the addition of a variable to identify whether the additional variable was having a significant effect on few/many genes and which genes specifically were affected by which batch level. I have not found any pipeline or example attempting this though, and was wondering if anyone else had such an example they could point me to.

R RNA-seq heirarchalregression • 941 views

ADD COMMENT • link updated 3.7 years ago by ATpoint 85k • written 3.7 years ago by RNAseqer ▴ 270

4

Entering edit mode

It's probably easier and more informative to start out with a PCA plot. See the PCAtools vignette. Check if any of the PCs are segregating your samples by batch.

ADD REPLY • link 3.7 years ago by rpolicastro 13k

1

Entering edit mode

You make it far too difficult to yourself, this goes for this post and all the others on the basically same topic before. You seem to have a normal RNA-seq dataset, therefore simply follow established procedures rather than coming up with very custom solutions. Perform a PCA or MDS, see whether you have evidence for the batch effect, and then include it into the design given it is not confounded with factDx. That's it, from there on simply follow edgeR manual (which also covers batch effect by the way). Agreed with rpolicastro on PCAtools, that is probably the simplest solution. You can also check for a simple analysis: https://www.biostars.org/p/461026/ which covers both simple batch diagnostics with PCAtools and a DEG analysis via QLF framework with edgeR.

ADD REPLY • link 3.7 years ago by ATpoint 85k