Hi, I have the samples for 4 plant types (WT, KO1, KO2 and OE1) in three replicates, Mock treated and after stress treatment. So in total 24 samples (4 Mock and 4 Treated, 3 replicates of each). I would like to analyse the differential gene expression in each sample type before treatment (Mock) and after treatment. I used STAR for alignment of reads on reference genome. Then I obtained count.txt files for each sample by using the tool featureCounts. Now I would like to analyse the differential gene expression using edgeR. But I am having problem in creating the matrix as I have a large number of samples. Could you please help me in that?
Thanks
What have you tried in terms of your model matrix? What are the comparisons you would like to make?
Thank you Devon Ryan. I would like to make comparisons within the plant types (WT vs KO1, Wt vs KO2, WT vs OE1, Mock and treated samples).
So a design of
~group+treatment
would work.Following this link you'll find a paper explaining how to use EdgeR and how to properly set a contrast matrix that will help you making the correct comparisons:
https://f1000research.com/articles/5-1408/v3
If you don't do KO1 vs KO2 or KO2 vs OE1 comparisons, perhaps it is better to only consider running the DEG analysis on WT vs KO1, then WT vs KO2, etc.
Having everything in one go is better as you can compare everything to everything. The downside is that it is hard to find a consensus when removing the lowly / unexpressed genes, and because of this you might discard important genes. But if you are not stringent enough, you'll end up with an incorrect normalisation and scaling factors.