Question

Can you pool individuals to make 1 biological replicate?

1

Entering edit mode

9.6 years ago

Biogeek ▴ 470

Dear all,

I want peoples opinions on this sort of approach to RNA-Seq.

If I am running an RNA-Seq study and we have 4 treatments:

control
low
med
high

and we have 4 time points, T0,T1,T2,T3

The original plan was to take 3 biological replicates( 1 plant as a biological replicate) for each treatment over each time point - which is the norm in DGE RNA-Seq experiments. However is it deemed inappropriate if we were to take 2x3 plants at the same time point and treatment, and pool 2 of the 6 plants into each of the 3 biological replicates?

The idea behind this, is to get a more accurate estimation of differential gene expression between treatments over time points,whilst minimising statistical variance. Obviously plants in treatments will be stronger/ weaker than others and will be behaving different, despite a 2 week week acclimation period to control treatment.

Please let me know if this pulling of tech reps into biological reps would be frowned upon or cause bias? And also if it would in fact make a more sound RNA Seq DGE experiment and analysis.

Thanks very much.

biological-replicates RNA-Seq • 5.9k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.6 years ago by Biogeek ▴ 470

0

Entering edit mode

I was having a hard time understanding your question, but maybe you mean "pool individuals" instead of "pull individuals"? This would make a lot more sense.

edit: two years after the original posting, I finally edited the post.

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.6 years ago by h.mon 35k

Ram · Answer 1 · 2015-05-23

This topic is a "heated matter of preference".

Some people will say that by pooling individuals, you will average out variance and not be able to see the true stochastic signal. For example, if you take the RNA profile of 1 cancer cell and compare it to 1 healthy cell, there will be big differences (but none perhaps that statistically significant).

If you take 100000 different cancer cells, and compare it to 100000 different healthy cells, you probably wont see many differences at all, except for perhaps small (but highly statistically significant) differences at the major, well known oncogenes. All the interesting signal from the cancer profiles were averaged out into background noise. You would not be able to 'see' the signal at the brca breast cancer genes because they were not relevant to all the liver cancers, lung cancers, etc, and they were all pooled/sequenced together.

Of course, no one would ever do that experiment - but the problem of averaging is true for even for multiple cells from the same tissue, let alone same kinds of tissue. Say a lump of material all came from one individual with one kind of cancer, but the cause of the cancer was a bug in the cell's metaphase code. Since most cells are in growth phase when you grind it up and run it through the sequencer, the difference between wild and cancerous tissue is negligible. This is why "single-cell X" is getting more and more popular.

https://player.vimeo.com/video/126829858

However, people on the other side of the fence will argue that technical and biological noise is so significant, that to try and work with it and come to conclusions based on very small confidence intervals is not only a bad idea, but it's "destroying 21st century science" (recent guest lecturer at my institute), and that if you cannot commit to 20+ biological replicates per time-point (pooling or otherwise) study, they you shouldn't even bother.

Of course, the truth lies somewhere in the middle. It depends on how likely you expect your 'signal' to be stochastic and how much you expect it to be continuous. It depends on how consistent your technical noise is. It depends on how much time/money you have to spare.

Pretty much mainly depends on that last one ;)

Ram · Answer 2 · 2015-05-24

I get why you would do that when you have different time points: it's hard to take exactly the time point you need as there is a variation of how fast they grow. I think it's okay to pool samples but you would need to pool more than 2 to get the pool statistically significant as well. And John has a good point, that it will average out some gene expressions. If there is something with a very time-sensitive expression it would probably got averaged out. Also depends on what your goal is: if you want to find only a set of statistically significant genes pooling would be better, but if you are trying to do the profile of everything you would get a lot of false negatives.