RNA-seq: is there a way to get the variability between biological replicates ?
4
0
Entering edit mode
5.7 years ago
▴ 240

Hi,

As the title says, I would like to know if there is a way to infer a percentage of variability between biological replicates.

For example, if we have 2 controls vs 2 treated. Is there a way to tell how much the controls differs between themselves in term of reproducibility ?

Let's say I have the read counts of the genes for these samples. Does performing a quantile normalization over the matrix followed by a MA plot would help us see how "divergent" or "similar" they are ?

Thanks

RNA-Seq • 3.8k views
ADD COMMENT
1
Entering edit mode

Can you make your example have more than two replicates? There are tools to measure changes in variability, but you need reasonable replicate numbers for that.

ADD REPLY
0
Entering edit mode

sure Devon, but let's say in my case I only have two replicates. What are the possible options ? Could you also please tell me what are those tools ? Thanks

ADD REPLY
0
Entering edit mode

If you only have two replicates there is no sense in which you can estimate the within-group variability.

ADD REPLY
5
Entering edit mode
5.7 years ago
igor 13k

You can check the DESeq2 vignette for some QC ideas, including how to perform normalization.

For example, you can look at sample-to-sample distances which gives you a quantitative measure of how similar the samples are:

enter image description here

ADD COMMENT
0
Entering edit mode

But in DESeq2 that's possible only after performing the analysis, right ? I would like to see that just from the count reads. To my knowledge, I am not sure if DESeq2 can do it prior to the analysis.

ADD REPLY
3
Entering edit mode

That's part of the exploratory data analysis, and not the differential expression analysis.

ADD REPLY
0
Entering edit mode

ok thanks for the info

ADD REPLY
0
Entering edit mode

differential analysis comes late and Wouter is absolutely on point, you will need to make preliminary exploration of the data before jumping into downstream DE analysis and farther.

ADD REPLY
0
Entering edit mode

I agree that an initial set of QC plots is important.

However, I would also say your differential expression and functional enrichment should inform decisions about revising upstream steps like normalization (and you should expect to have several rounds of analysis / discussion before publication). So, I would not be surprised if you were going back to produce alternative QC plots based upon subsequent steps of analysis (in a later "round").

ADD REPLY
2
Entering edit mode
5.7 years ago

What about pairwise correlation, R^2?

ADD COMMENT
1
Entering edit mode

Would it also work in this case ? What about normalizing for the read counts ?

ADD REPLY
1
Entering edit mode

Wouter's suggestion is probably the most straightforward (you first normalize, scale in log, then compute the correlation). For the normalization, there are many options available. Quantile normalization is a bit uncommon for RNA-seq data, I would rather suggest using the normalization and rlog transformation from DESeq2. Finally, an alternative way to assess replicate consistency in a broader context would be to perform a PCA of your samples.

ADD REPLY
0
Entering edit mode

+1 to both sample distance clustering and PCA mention. Both should be able to give what the OP is looking for. Pairwise correlation couple with sample distance clustering is also another added way to do it. DESEq2 vignette is the key plus look into corrplot. Should be a pretty straigh forward approach. Having said all, remember what Carlo is suggesting, proper normalization and scaling is important before doing any statistically relevant visualization.

ADD REPLY
0
Entering edit mode

I'm not sure if we are already talking about pretty much the same thing, but you can use Pearson Dissimilarity as a distance metric (instead of Euclidean Distance) in a dendrogram for all features (or a heatmap for differentially expressed genes).

ADD REPLY
0
Entering edit mode
5.7 years ago
Ashastry ▴ 60

Correlation matrix can answer your question. Although, I second the thought that it is always better to have 3 replicates for such analysis. DESEQ2 accounts for extreme variabilities in it's model if there are more replicates. They also recommend box plots to detect the outliers, Again,with more replicates it would make sense. Please read the vignette for more information

ADD COMMENT
0
Entering edit mode
5.7 years ago

I would recommend visualizing differentially expressed genes in a heatmap with an independently calculated expression value (such as FPKM).

In general, I tend to think of the p-value as a method that can filter gene candidates based upon replicate variability, but it is also important to keep in mind differences may sometimes be more clear in some samples than other samples (and you kind of have to test the different methods for each project to see what looks visually satisfactory for your particular dataset - if I give you a generalization, you will eventually be able to find an exception to that generalization).

ADD COMMENT

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6