I have a 22-day time course experiment I conducted with two plant genotypes and five time points (0, 1, 7, 14, 22 dpi). I have manged to successfully complete some basic analysis of my RNAseq data. I now wanted to do some multi factor analysis but wasn't sure where to stat. Here is the layout of my samples:
RA - resistant, 0 dpi RB - resistant, 1dpi RC - resistant, 7dpi RD - resistant, 14 dpi RE - resistant, 22 dpi
SA - susceptible, 0 dpi SB - susceptible, 1 dpi SC - susceptible, 7 dpi SD - susceptible, 14 dpi SE - susceptible, 22 dpi
Here are the analyses I have done so far:
RA-SA RB-SB RC-RC RD-SD RE-SE
RA-RB RA-RC RA-RD RA-RE
SA-SB SA-SC SA-SD SA-SE
I am uploading my data using the tximport function. I have designed my sample file like so:
sample
timepoint treatment genotype
S1A_rep1 0 SA S
S1A_rep2 0 SA S
S1A_rep3 0 SA S
S1B_rep1 1 SB S
S1B_rep2 1 SB S
S1B_rep3 1 SB S
S1C_rep1 7 SC S
S1C_rep2 7 SC S
S1C_rep3 7 SC S
S1D_rep1 14 SD S
S1D_rep2 14 SD S
S1D_rep3 14 SD S
S1E_rep1 22 SE S
S1E_rep2 22 SE S
S1E_rep3 22 SE S
R1A_rep1 0 RA R
R1A_rep2 0 RA R
R1A_rep3 0 RA R
R1B_rep1 1 RB R
R1B_rep2 1 RB R
R1B_rep3 1 RB R
R1C_rep1 7 RC R
R1C_rep2 7 RC R
R1C_rep3 7 RC R
R1D_rep1 14 RD R
R1D_rep2 14 RD R
R1D_rep3 14 RD R
R1E_rep1 22 RE R
R1E_rep2 22 RE R
R1E_rep3 22 RE R
The specific four comparisons I want to make are the following:
(RB - SB) - (RA - SA) (compare RB and SB, then compare that to RA and SA, compared)
(RE + SE) - (RA + SA)
(RE + RD) - (RC + RB)
[(RE + RD) - (RC + RB)] - [(SE + SD) - (SC - SB)] (compare the early vs late response of the R plants to that of the susceptible)
Do I need to change my sample object to facilitate these comparisons or is there another piece of code I should be using?
Very nice answer, I just wanted to add that as far as I know, there is no clear consensus on [1] (calculating FDR across multiple contrasts). While making multiple contrasts increases the number of false positives, it does not necessarily increases the rate of false positive (FDR). There was an interesting discussion a few weeks ago on biostars, pointing to a few posts from Gordon Smyth (author of Limma/edgeR). He seems to stands for applying FDR to each contrast separately : What are the necessary and sufficient conditions for applying a multiple hypothesis correction? .
Thank you. I am a bit confusted about this:
Why would the RB and SA be add coefficients if I wanted to compare each genotype's 1dpi timepoint to its 0dpi timepoint and then compare those comparisons to each other?
It's just how the math works out, there's a reason we only think in terms of the left side of that equation. Your question is asking for the interaction between genotype and timepoint, which is the difference of differences on the left.
OK, so if I were to do the (RB - SB) - (RA - SA), would I use the following code:
Its just the result of mutlipling out the brackets:
The code you would you would be something like:
Although the precise things that would go in those first two lines would depend on what the coefficients are called. You could get the names to use by using
resultsNames(deseqData)
Got It! I figured out how to do the specific contrasts I wanted. I just produced the model matrix of my analysis with
model.matrix(design(dds), colData(dds))
and then labeled new objects based on how I labeled my treatments (RA = colMeans(mod.mat[deseqData$treatment == "RA", ])
, then ran the contrasts I had listed above. Thanks.