Question

Pairwise comparision in DESeq2

0

Entering edit mode

8.3 years ago

EVR ▴ 610

HI,

I have a RNA seq data performed at different time points. So for every time I have 4 samples(Control, Knock-down_1, Knock-down_2, Knockdown_3) and I want to compare every Knock-down samples to its Control samples. As DESeq2 two set of samples to predict the Diff. expressed genes, how the analysis can be carried out:

 a) Includes all samples from this particular time point and later use contrasts function to find the Diff expressed genes between specific samples
                                   OR
 b) Include only the two samples for which you want to find the Diff expressed genes and finish the analysis.

Thanks in advance

RNA-Seq DESeq2 • 2.5k views

ADD COMMENT • link updated 8.3 years ago by Carlo Yague 9.0k • written 8.3 years ago by EVR ▴ 610

score 2 · Answer 1 · 2017-03-08

2

Entering edit mode

8.3 years ago

Carlo Yague 9.0k

The correct answer is "c) Includes all sample from all time points" because it will give you the best gene-level dispersion estimate.

There is a great tutorial here that explain how to do time-course analysis with mutants with DESeq2.

ADD COMMENT • link 8.3 years ago by Carlo Yague 9.0k

0

Entering edit mode

Thanks for your comment. But I am not comparing the samples of one time point with another time point but samples within the time point so why to include samples of other time points. Wont it influence values of the other samples. For an example, is it worth having the counts of samples from day7 influencing the counts of samples in day1?

ADD REPLY • link 8.3 years ago by EVR ▴ 610

1

Entering edit mode

Like Carlo said, "..because it will give you the best gene-level dispersion estimate". Fit the model using all of your data and it will give you a better estimate of the mean/variance-trend for any given gene. With this estimate, you can better estimate differences between your experimental arms at any given timepoint than if you were analysing just the samples from that timepoint.

ADD REPLY • link 8.3 years ago by russhh 5.8k

0

Entering edit mode

Thank you russhh. I can understand to get a better gene-level estimate, it is better to use all samples from all time points. But I still cant understand.For an example, wont the actual real expression(raw counts) of gene x at 3 hours gets affected by its actual expression(raw counts) of same gene x at day7?

ADD REPLY • link 8.3 years ago by EVR ▴ 610

1

Entering edit mode

The expression will be unaffected if you take all time points, but you will be more accurate when assessing the significance of a difference in expression.

That is, as long as your model take into account the time and the interaction between the time and the strain . If you don't consider the time in the model, then your time points will be seen as replicates and the "expression" would be affected.

ADD REPLY • link 8.3 years ago by Carlo Yague 9.0k

1

Entering edit mode

Admittedly the counts at day3 and day7 will be statistically/biologically dependent. But, how to account for that dependence is not the question that you originally posed. Data from any quantitative experiment can be viewed as comprising signal and noise. You'd hope that although there may be dependence between the fitted values for your different samples, the noise should be uncorrelated between those samples. And it's your ability to estimate the amount of noise that is improved when you include all of your different timepoints.

ADD REPLY • link 8.3 years ago by russhh 5.8k