Entering edit mode
7.7 years ago
EVR
▴
610
HI,
I have a RNA seq data performed at different time points. So for every time I have 4 samples(Control, Knock-down_1, Knock-down_2, Knockdown_3) and I want to compare every Knock-down samples to its Control samples. As DESeq2 two set of samples to predict the Diff. expressed genes, how the analysis can be carried out:
a) Includes all samples from this particular time point and later use contrasts function to find the Diff expressed genes between specific samples
OR
b) Include only the two samples for which you want to find the Diff expressed genes and finish the analysis.
Thanks in advance
Thanks for your comment. But I am not comparing the samples of one time point with another time point but samples within the time point so why to include samples of other time points. Wont it influence values of the other samples. For an example, is it worth having the counts of samples from day7 influencing the counts of samples in day1?
Like Carlo said, "..because it will give you the best gene-level dispersion estimate". Fit the model using all of your data and it will give you a better estimate of the mean/variance-trend for any given gene. With this estimate, you can better estimate differences between your experimental arms at any given timepoint than if you were analysing just the samples from that timepoint.
Thank you russhh. I can understand to get a better gene-level estimate, it is better to use all samples from all time points. But I still cant understand.For an example, wont the actual real expression(raw counts) of gene x at 3 hours gets affected by its actual expression(raw counts) of same gene x at day7?
The expression will be unaffected if you take all time points, but you will be more accurate when assessing the significance of a difference in expression.
That is, as long as your model take into account the time and the interaction between the time and the strain . If you don't consider the time in the model, then your time points will be seen as replicates and the "expression" would be affected.
Admittedly the counts at day3 and day7 will be statistically/biologically dependent. But, how to account for that dependence is not the question that you originally posed. Data from any quantitative experiment can be viewed as comprising signal and noise. You'd hope that although there may be dependence between the fitted values for your different samples, the noise should be uncorrelated between those samples. And it's your ability to estimate the amount of noise that is improved when you include all of your different timepoints.