RNA Sequencing Pooling design and statistic analysis
2
0
Entering edit mode
9.2 years ago
Sam ★ 4.8k

We are currently planning to perform an RNA Sequencing on a total of 35 mouse samples. There are 2 conditions: treatment and disease status. We would like to identify differential gene expression between the conditions. e.g. Treatment A compare with Treatment B in Cases and Treatment A compare with Treatment B in Controls etc.

We only have enough money to perform sequencing on 12 samples (1 lane + 12 indexing). So we are planning to perform pooling before performing the RNA Seq. The concern we have now is statistically what is the best pooling strategy? When pooling the data together, it is more likely than not that the distribution of the counts no longer follow the negative binomial distribution that is assumed by tools like edgeR and DESeq2. The power of these test will bound to be affected.

Have anyone got experience on pooled RNA Sequencing analysis? What should be take into account when performing the analysis? How should we design the pool? Should we use ERCC spike in? If we should, how should we use it?

Thank you

RNA-Seq • 4.2k views
ADD COMMENT
4
Entering edit mode
9.2 years ago

Why do you need to pool? Just sequence 3 sample per group and you'll have your 12. If for some reason you must pool then pool within litters. Mice within a litter aren't really replicates anyway, so you're not masking as much of the variability.

ERCC spike-ins don't help with pooling, other than perhaps allowing you to tell if samples were equally pooled (e.g., using a different subset of the spike-ins for each sample in a pool).

ADD COMMENT
0
Entering edit mode

We were slightly worried that the random selection of samples might be challenged by others. Though now you've mentioned it, it is more or less how we usually do things, randomly selecting a particular amount of samples instead of sequencing the whole population.

As for the litter pooling, we currently have 7,10,9,9 samples per group. If we count the number of litter per groups, then we have 3 little per groups. The problem though the litter size differ quite a lot where some litter can be as much as 5 samples and some were as small as 1.

So do you think pooling should be done for litter or should we just randomly select 3 samples (or by RIN and concentration), and perform the analysis on that, where the remaining samples were used as samples for qtPCR?

ADD REPLY
0
Entering edit mode

It's always best to avoid pooling unless absolutely required. If you have 3 litters per group then just take a sample with a nice concentration/RIN per litter. Note that while you can use the other samples for qPCR, they still won't represent true replicates. There's simply too much correlation between littermates (yes, I realize that this results in needing a lot of cage space for experiments, I've been there).

ADD REPLY
0
Entering edit mode

Thanks Devon, we really want to make sure we have got our experimental design right. Always good to have comments from other people. This helps us a lot!

ADD REPLY
0
Entering edit mode
8.2 years ago

Hi,

My service vendor says, they will calculate p value on only one sample per group (there are two groups). How is it possible? Can any body explain? I think that to calculate statistical significance of differential value for any gene between two groups we should have at least three samples per group.

ADD COMMENT
1
Entering edit mode

Hi kumarsudershan, please start a new question. I'd suggest you to ask your vendor for more details.

ADD REPLY

Login before adding your answer.

Traffic: 1820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6