Hi macmath,
I'd suggest looking into limma and edgeR for analysis.
You have to set up the coefficients and contrasts for the linear model. It looks like you have 2 experimental factors and one error term, one being KO status, and the second being time, and the last being known batch groups. You want to correct for batch as it could potentially effect your result. If you don't know the batch status, you can use something like SVA to do a generalized batch correction.
So starting off let's generate the model. I assume you probably have an excel file or csv file with each sample in a row and each column being either KO status, time, and sequencing batch. So read those in and generate a design for a linear model:
factor.table <- read.table("omics.csv", header=TRUE, row.names=1, sep="\t")
ko_status <- as.factor(factor.table$ko_status)
time_status <- factor.table$time_status
batch <- as.factor(factor.table$rnaseq_batch)
design <- model.matrix(~0 + ko_status + time_status + batch + ko_status:time_status)
So that's the basic design, and what it means is that the linear model will generate an intercept for each factor plus any interactions between KO and time status. You then generate contrasts to test for significant differences between defined groups, eg:
cont.matrix <- makeContrasts(KO_true = ko_status1 - ko_status2,
..., levels = design)
You'll have to play around with this a bit to generate the proper contrasts. There are working examples in the limma documentation which will help you figure it out, so do your reading.
Next you filter and normalized the data, then run the linear model:
y <- DGEList(counts = raw_counts)
y <- y[!rowSums(y$counts == 0) == ncol(raw_counts),]
A <- aveLogCPM(y)
y2 <- y[A>1,]
y3 <- calcNormFactors(y2, method = "TMM")
dge <- voomWithQualityWeights(y3, design=design, normalization="quantile", plot=FALSE)
fit <- lmFit(dge, design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit3 <- eBayes(fit2, trend = TRUE, robust = TRUE)
I believe fit2 gives you the main effects, and fit3 will give you the DEG for whichever contrast. I also believe you can use one of the fits to do a repeated measures anova, but I think you can flush out the effect using the interaction term in the design.
Thank you for the needful suggestions I have a doubt. I forgot to mention in the description of the question.
These KO are specific for each condition they are not common among the three
Not really sure what your objective is. Could you explain your groups in biological terminology.
WT = Wild-type, KO = knockout/mutant . The KO are not identical across the three conditions so these are separate experiments. Unclear how the time component fits in the picture.
To clarify, biologically the KO are same and I am a bit confused with the KO status, time, and sequencing batch. My files compose of count files generated by HTseq count for each biological replicate and its respective technical replicates.
12 weeks WT vs KO 4 biological replicates and 4 technical replicates i.e. 32 count files
4 weeks WT vs KO 3 biological replicates and 3 technical replicates i.e. 27 count files
0 weeks WT vs KO 3 biological and 3 technical replicates i.e. 27 count files
I would also like to know if the technical replicates should be collapsed?
and
Those two statements (made in two separate posts) are conflicting. Which of those statements is correct? I would think the KO must be identical otherwise the time component has no meaning since the three sets would be separate non-comparable experiments.
Technical replicates for sequencing are not needed since sequencing has become pretty standard (unless you used different sequencer models that use different chemistry for sequencing e.g. NextSeq,HiSeq). You could collapse them.
so what are you knocking out?