My experiment is the following:
- Temperature 26C vs. Temperature 30C
- Treatment Saline vs. Treatment BMC
- Timepoints 0, 2, and 3.
My intention is to create GLM model to look at differential abundance between groups:
- 26C vs. 30C for Saline treatment
- 26C vs. 30C for BMC treatment
- Saline vs. BMC at 26C
- Saline vs. BMC at 30C
In doing so, I would like to include all the samples possible when calculating the dispersion which is why I'm using a GLM instead of a Fisher's Exact Test for subsets of samples. I would also like to incorporate the ordered time information.
# Functions
read_dataframe = function(path, sep="\t") {
df = read.table(path, sep=sep, row.names=1, header = TRUE, check.names=FALSE)
return(df)
}
# Counts
X = read_dataframe("https://pastebin.com/raw/J7kmL8Ly")
# OG0000000 OG0000001 OG0000002 OG0000003 OG0000004
# T2_10_SALINE_TEMP-PE-D710-D505-1_S10 16909 55 5382 5894 1964
# T2_11_BMC_CONTROL-PE-D711-D505-1_S11 24296 60 2772 3962 1374
# T2_12_BMC_CONTROL-PE-D712-D505-1_S12 24619 60 7351 5389 560
# T2_13_BMC_CONTROL-PE-D701-D506-1_S13 22420 15 2172 2778 930
# T2_14_BMC_CONTROL-PE-D702-D506-1_S14 20049 82 4655 6211 553
# Metadata
df_metadata = read_dataframe("https://pastebin.com/raw/PANaC3r5")
# temperature treatment collection_time_numeric
# 1_T0_RNA-PE-D711D501-1_S143 26C NaN 0
# 2_T0_RNA-PE-D709D506-1_S134 26C NaN 0
# 4_T0_RNA-PE-D709D505-1_S133 26C NaN 0
# 5_T0_RNA-PE-D709D504-1_S132 26C NaN
If the T0 timepoint is throwing everything off then it can be removed.
I'm trying to figure out how to do this from the following sources: https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
However, my situation isn't described so please no RTFM responses. If I make a design matrix, it will basically be a binary vector for Treatment_BMC, a binary vector for Treatment_30C, and a numeric vector for the collection time.
If I use this as the design matrix when calculating the dispersion, then how would I for example do #1 above where I calculate the 26C vs. 30C for just the Saline treatment? Does this not make sense to calculate the dispersion for everything?
I could use some guidance a bit here.
Why have you posted this twice? - please see Understanding why `Design matrix not of full rank. The following coefficients not estimable` during GLM
I tried deleting the other one but couldn't find a delete. Is it possible to delete my other post that you referenced. Apologies.