Hi all, I would appreciate your help to better understand how to analyse our shRNA-Seq data. We have a shRNA-library from mouse. Similar to this paper, in Fig. 1 , we have two conditions (WT, KO) and two stages (before [b], after [a]), each with multiple replicates.
before samples - WT_b and KO_b
after samples - WT_a and KO_a
Also we are doing a loff-of-function analysis, so we would like to see if there is a dropout in the KO compared to the WT.
I have searched for tools and found the edgeR tutorial, which gives several examples. But all this examples have only two conditions compared against each other. I have a read count table after using segemehl
to map and HTseq-count
to quantify the samples.
Do I need to take all four conditions into account or should I only compare the after samples (KO_a vs. WT_a
) to see if there are dropouts
should my experimental design include all the samples or only the last two
should it be something like that:
(KO_b / KO_a) - (WT_b / WT_b)
if my columns of the sampleData are condition
and stage
, like that:
sample condition stage
WT_1 WT Input
...
WT_6 WT after
...
WT_10 WT after
KO_1 KO Input
...
KO_10 KO after
I would appreciate an idea of how to create the design matrix.
would model.matrix(~stage)
or a more complex design such as model.matrix(~condition + condition:stage)
would be here necessary?
thanks in advance
May I know, You want to compare conditions or have a control to compare with?
In terms of Matrix design. Yes, if you doing with multiple conditions then the second command will come handy.
This is the point - I'm not sure which samples to take. The end goal is to identify genes which show a dropout in the comparison MUT vs. WT. But do I need to take all four sample groups or a comparison of the two experimental sample groups?
DESeq2 is very similar to EdgeR, and their tutorial has a very nice walk through of multi-factor designs, including common errors and how to address them. It's worth a read, and if you aren't married to EdgeR, I found DESeq2 to be more user friendly when I was starting out:
http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
But does it also works fine with shRNA-Seq? it is a small number of genes analysed and the number of changed (differentially expressed) genes might be higher than what DESeq2 assume to begin with.