Pool or not to pool
1
6
Entering edit mode
2.7 years ago
Rafael Soler ★ 1.3k

Hello,

If we have 12 mutant and 12 control mice, and we want to sequence them and analyse them with DESeq2, which experiment would be better?

A) Sequence 4 animals separately only

B) Pool the 12 animals into 4 samples (3 each)

At the end, we will have the same n = 4

Thank you!

DESeq2 Samples Pool Variability • 2.2k views
ADD COMMENT
1
Entering edit mode

What do you mean by "mix"? Like pooling the RNA and make a single library our of mutant and control? If so this would be terrible because you lose and information on the biological variability which is the basis for any statistics.

ADD REPLY
0
Entering edit mode

Sorry, I will edit the question. It is to have 12 mutant mouse, and to pool each 3 of them into 4 samples. Is that better rather than only sequence 4 animals?

ADD REPLY
1
Entering edit mode

Statistically 12 is more powerful than 4, so that is to be preferred, but money and feasibility plus amount of RNA you get per mouse will guide you. If you can do 12 do 12, else do 4. Depends also whether the effect you expect is large, then four might be enough, else 12 might be required to have to power to see effects. We have projects in which we know that we need a larger n because the anticipated effects are small and variable per mouse, while sometimes we simply get very few RNA out of the populations we sort so we have to pool mice. It really depends.

ADD REPLY
1
Entering edit mode

I think it's to save money. Instead of paying for 12 libraries, OP would be paying for 4 libraries.

ADD REPLY
0
Entering edit mode

Yes, the thing is to sequence 4 samples each one being one mouse, or to sequence 4 samples each one being a pool of 3 mice

ADD REPLY
1
Entering edit mode

4 mice is better than 4 pools, since variation is happening at the mouse level. Make sure they're from different litters.

ADD REPLY
0
Entering edit mode

Can you explain what means "variation is happening at the mouse level"? Why reducing the biological variability is a bad thing? If you pool 3 samples in each sample of the 4, you decrease the possibility of having one sample that is an outlier for example?

Thank you for the response and all the feedback :)

ADD REPLY
2
Entering edit mode

You want your results to represent the underlying biology as much as possible. When you pool mice, you're averaging out the signal. So your results are then differences in averaged out signals. But you don't care about the biology of pools of mice, you care about the biology of mice in general. So that's why you need to measure, individual mice. For some reason people like to delude themselves into thinking that somehow by using more mice and pooling them they're somehow sampling more mice. If they actually sequenced those individual mice that would indeed be the case, but by pooling them they're just shooting themselves in the foot by hiding the underlying biology.

ADD REPLY
1
Entering edit mode

Devon, I may be missing something here but I think I disagree... Ultimately, the OP is interested in estimating the difference in gene expression between mutant and control (he doesn't say explicitly but "DESeq2" suggests so). If you pool several mice then each library is closer to the true value because of the averaging. If instead you are interested in the variability within control and within mutant then (I think) pooling is a bad idea.

ADD REPLY
1
Entering edit mode

Ah, but the variability within them is necessary to discern of there are any real differences. Averaging out all of the real variability is just giving you false-positives.

ADD REPLY
1
Entering edit mode

I don't see why you should get false positives after averaging. If there is some confounder associated with the mutant/control state (e.g. mutants handled by one technician and controls by another) and you don't know about it then either design is flawed.

As a thought experiment: there is a decent chance that the four control mice are males and the four mutants are females, or controls are young mice and mutants are old, or... . You can control for these factors up to a certain point but with four replicates there is only so much you can randomize. If you pool instead, factors like sex, age etc. are averaged out and you are left with the difference mutant-control you are interested in. Is this not a good thing?

ADD REPLY
0
Entering edit mode

I was thinking that maybe if you do the pooling, each individual does not represent an animal, but a pool of 3 animals, and although you might be getting closer to the "true" nature of the KO (for example), when it comes to getting the p-values and to carry out the statistics to assess the biological variability between samples, you would be "cheating" in a way, which would end up giving you more differentially expressed genes (by not having as much biological variability between samples and increasing the significance), and this does not would be a real value of unique individuals, but rather a modified one for which the statistic is not designed.

ADD REPLY
1
Entering edit mode

I'm not arguing in favour of pooling here, rather I think pooling has merits and I'm not convinced by the arguments so far even if they sound appealing. By your reasoning, bulk RNAseq is "cheating" because it ignores cell-to-cell variability. Cell variablity is not less biologically relevant than the mouse-to-mouse variability, right? (For this reason I'm not too keen on the distinction between technical and biological replicates). If you had the option to sequence just one cell from each of 4 mice, would you do that rather than doing bulk RNAseq on millions of cells per mouse pooled together? I would say no if your question is estimating the difference between control and mutant.

Regarding "differentially expressed genes" I would keep in mind that that is not really a biological characteristic but rather a filter based on statistical significance (usually). Eventually, every gene is "differentially expressed" given enough replicates. Pooling will give you more genes passing FDR cutoff (if something is really going on between conditions) and it will also give you effect sizes closer to the population mean. The same example again: if your controls are on average healthier than the mutants for reasons unrelated to genotype (not a remote possibility with only 4 mice) then your effect sizes will be confounded by the health condition and you may in fact pick up false-positive genes. With pooling this counfounding is less likely.

I don't know... Saying that biological variability is something you want to keep sounds good of course but I would like to see some example to convince myself. If a gene is highly variable between mice but on average slightly higher in one condition, pooling could pick that gene up at FDR below a given cutoff but its FDR will be higher than a gene with the same average difference but more stable across mice (pools will still have some variability between each other). That is, an FDR ranking will put the variable gene after the stable one, that's good, right? With individual mice, the variable gene could get a very low FDR just because the controls got only mice with low expression and the mutants only mice with high expression (again, not unlikely given only 4 replicates).

ADD REPLY
0
Entering edit mode

Yes, you are probably right. Although it is not something that is very clear, I think I will choose the second option for now. Thank you so much :)

ADD REPLY
1
Entering edit mode
2.7 years ago

A (the biological variability is something you want to keep even if it can complicate interpretation)

ADD COMMENT

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6