Question

edgeR analysis w/ factor (Body of Water) and covariate data (Latitude) - Can you look at Latitude w/in Body of Water & Across Body of Water?

0

Entering edit mode

2.4 years ago

Gina • 0

I am using edgeR to look at differential community composition (bacterial taxonomy counts) - this can be treated just like RNA-Seq data. I have a factor (WaterBody) and a covariate (Latitude) for this dataset. There is an interaction effect based on adonis (and biologically it makes sense).

Extra details on data:

WaterBody: North_Atlantic, North_Pacific, Mediterranean_Sea
Latitude: 32, 34, 37, 38, 42, 43, 45, 46, 49, 52, 55, 58, 64, 67
Not all bodies of water were sampled at every latitude.

Below is what I currently have for my script:

y <- DGEList(counts=Counts_Bact, sample = Sample_Data)
y <- edgeR::calcNormFactors(y, method ="TMMwsp")
design <- model.matrix(~WaterBody + WaterBody:Lat, data=Sample_Data)
rownames(design) <- colnames(y)
disp = estimateDisp(y, design, robust=TRUE)
fit <- glmQLFit(disp, design, robust = TRUE)

At this point I have the following coefficients:

WaterBodyNorth_Atlantic
WaterBodyNorth_Pacific
WaterBodyNorth_Mediterranean_Sea
WaterBodyNorth_Atlantic:Lat
WaterBodyNorth_Pacific:Lat
WaterBodyNorth_Mediterranean_Sea:Lat

So now to reiterate my question. I wanted to compare bacterial taxonomy counts for certain latitudes within a body of water as well as across bodies of water (say which bacteria are differentially present in North Atlantic at 67N vs 45N and then North Pacific vs North Atlantic at 45N).

My understanding is what I have above is incorrect and I am going about it all wrong. It also may just simply not be possible? Was hoping to get input since looking at references below haven't quite gotten me there:

Ref 1: https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html#overview-of-models-fitted

Ref 2: https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

Any advice/insight would be greatly appreciated!

edgeR • 1.1k views

ADD COMMENT • link updated 2.4 years ago by Kevin Blighe 88k • written 2.4 years ago by Gina • 0

score 0 · Answer 1 · 2022-07-30

Hey Gina,

This seems like a situation where a compound model term may be the simplest solution. So, that would be, e.g., creating a group variable as a compound of WaterBody and Latitude. Then, your model formula would just be:

~ group

After that, you could just compare different terms on a pairwise basis.

If you also need a comparison of form North Pacific vs North Atlantic at 45N, then you could re-work the group variable in the desired way and then re-fit the model and conduct the comparison.

Sometimes trying to get the perfect formula is just impossible and creating interactions can result in over-fitting the model.

Kevin