Question

Correct for confounding variable (variability in knockdown efficiency) in paired t-test for change in gene expression

0

Entering edit mode

7.4 years ago

Scott ▴ 110

This is primarily a statistics question, but the biological context may help frame the problem.

There are two samples treated with different shRNAs. 1 = control shRNA, 2 = target shRNA.

Then PCR is performed on several other (non-target) genes of interest. Several genes change in expression between control and target shRNA treatments, but the extent to which they change is highly correlated with the extent to which the target gene has been knocked down.

Since baseline gene expression/ Ct values are fairly variable between biological replicates, it is good to perform a paired t-test. I have done this for each gene of interest and then of course correct the P value based on multiple testing. This works, however I would like to increase my power by including more replicates of the experiment with variable knockdown efficiency, but have the efficient of knockdown taken into account. I basically need a way to correct for a confounding variable in a paired t-test.

It seems my options may include: 1) ANCOVA. This would likely work great for raw Ct values, but does not seem to easily accommodate paired data. 2) Multiple linear regression. Any advice on how to actually go about doing this would be appreciated (if it is indeed the recommended method)

Thanks!

statistics t-test gene qPCR ANCOVA • 2.6k views

ADD COMMENT • link updated 7.4 years ago by Devon Ryan 105k • written 7.4 years ago by Scott ▴ 110

0

Entering edit mode

Am I correct in assuming that the pairing is between a control shRNA and a target shRNA in the same sample (it's unclear what the actual relevant system is in this case)?

ADD REPLY • link 7.4 years ago by Devon Ryan 105k

0

Entering edit mode

No. One sample has been treated with control shRNA and another sample treated with target shRNA. They are treated in parallel though, hence the pairing. Hope that makes sense.

ADD REPLY • link 7.4 years ago by Scott ▴ 110

score 1 · Answer 1 · 2018-01-11

If you loaded your data into R and made a data frame that looked something like:

sample  batch   treatment       efficiency      Ct
s1      b1      control 0.0     30
s2      b1      shRNA   0.3     25
s3      b2      control 0.0     29
s4      b2      shRNA   0.2     27
s5      b3      control 0.0     30
s6      b3      shRNA   0.8     20
s7      b4      control 0.0     29
s8      b4      shRNA   0.9     21

Then the linear model for what you want is lm(Ct ~ batch + efficiency) and you're interested in seeing if efficiency is > 0. You don't need the treatment column, I've put that in so it's easy to see that all control shRNA samples will have an efficiency of 0. batch designates your pairs, feel free to rename it.

At the end of the day, this is just a tweaked version of a paired T-test, where instead of testing if the paired difference between shRNA and control is 0, you test whether this difference as a function of the efficiency has a non-zero slope. You can also do this somewhat manually, by subtracting the control from the shRNA within each pair and then regressing that difference vs. KD efficiency.

FYI, the p-value for KD efficiency (or shRNA treatment controlling for efficiency, if you prefer) in the above example is ~0.004.