RNA-seq: Is it worth to analyze case vs control separately to have a % similarity ?
3
0
Entering edit mode
5.7 years ago
▴ 240

Hello,

Imagine if I have a sample of Case (Ca) vs Control (Co) with 2 replicates, in other words, we would have: Ca1, Ca2, Co1 and Co2.

Typically, one would analyze the data by comparing the conditions of both replicates together such as:

(Ca1 + Ca2) vs (Co1 + Co2)

Would it make sense to analyze each combination of the replicates (Ca1 vs Co1, Ca1 vs Co2, Ca2 vs Co1, Ca2 vs Co2), compare the outliers (DEGs) and see how many of them are common and divergent ?

Thanks

RNA-Seq • 1.6k views
ADD COMMENT
0
Entering edit mode

It might, depending on whether your replicates are biological or technical?

ADD REPLY
0
Entering edit mode

And the experimental design. Do we care about differences in Ca1 and Ca2? By the design of the experiment those differences should be considered background noise.

ADD REPLY
0
Entering edit mode

Let's say that that we do care about both these differences, that's why I am asking if it makes sense to compare them that way.

ADD REPLY
1
Entering edit mode

Depending on the replication type and preprocessing of replicates etc, you might need to consider pairing/grouping the samples but this depends on experimental design as mentioned.

For instance, say you had 1 control culture of cells, and 1 case culture. You split that in to 3 replicates each, and treat that as 6 samples (2x3), right? Wrong! Because they all came from the 2 same original flasks, they are intimately grouped together in a way that cannot be disentangled entirely and no matter how they were processed from that point onward, they are not truly separate replicates. It cannot be entirely just written off as noise (though in practice I suspect thats what many do).

ADD REPLY
1
Entering edit mode

Thanks for clearing that up. My samples comes from a section of a tissue having tumor and normal cells. They have been extracted using punch biopsies (two for each) which means they come from different parts, thus not originally from the same cells

ADD REPLY
0
Entering edit mode

They are biological duplicates

ADD REPLY
1
Entering edit mode

A happy medium might just be to look at the between-sample variances (PCA etc), and satisfy yourself that the 2 replicates are very similar, and no between-replicate extreme variability is going to cause issues. You probably don't need to do a complete DGE workflow (not that you could anyway without replicates).

ADD REPLY
6
Entering edit mode
5.7 years ago
h.mon 35k

In my opinion, it doesn't, for two reasons:

1) Two samples per treatment is a very small number of replicates. The "rule of thumb" states three samples per treatment as the bare minimum, even then, in several cases these three samples aren't enough to get proper statistical power.

2) This pairwise comparison is not amenable to generalization, as the number of comparisons grows exponentially as you add more samples.

To look at "divergent" within-treatment genes, I would look for the genes with low fold-changes and high standard deviation.

ADD COMMENT
0
Entering edit mode

Thanks for your feedback on this matter.

ADD REPLY
3
Entering edit mode
5.7 years ago
ATpoint 85k

I do not see any sense in this because 1vs1 comparisons will not produce meaningful DEGs due to the lack of statistical power. You can do naive log2-fold change calculations but your results will contain many false-positives with low counts where large fold changes are expected (mean-variance dependency). Use your 2vs2 setup with any of the established DEG software (edgeR, DESeq2, limma, sleuth...). That is probably the best you can do. Depending on the variance between the intra-group replicates and the intergroup replicates (and the biological effect size) you'll get none, few or quite some DEGs.

Edit: Good point from h.mon that I think makes sense:

To look at "divergent" within-treatment genes, I would look for the genes with low fold-changes and high standard deviation.

ADD COMMENT
0
Entering edit mode

Thank you for your feedback. That's also what I though. I already did the analysis with DESeq2 and I obtained around 2k DEGs. However, my supervisor requested that I make a comparison for each combination and in my opinion it didn't really make any sense... I wanted to have the opinion of the community.

ADD REPLY
2
Entering edit mode

It does not make any sense. Tell you supervisor RNASeq does not work like that and shoot him some standard publications. Again 2K genes is a lot in theory for such a sample size Case vs Control. Mostly things will be false positives. If you know the tumor model with established signatures then see how they behave in those 2k DEGs, that already is some finding. But each combination does not make any sense statistically. The null hypothesis for 1 vs 1 does not stand good for overal variance estimate. What you are checking there is gene variance compendium but 1 vs 1 is not a proper statistical ploy.

ADD REPLY
1
Entering edit mode
5.7 years ago
ivivek_ngs ★ 5.2k

I believe you are not considering similarity between your case vs control right? You want how much they differ and what are the inherent differences?

  1. The statistical distribution still holds week if this is a bulk RNASeq data. edgeR, DESeq2, limma, etc will work but you wont be able to make much out of the results unless you see some of your deferentially expressed genes are gold standard for your disease models or have been seen in real world sets or even validated by PCR. I would expect a lot of false positives. Your intra-group variability will be not optimal and hence overall variability is also not pretty robust as to how these these DEGs software are based. Do a PCA or distance based clustering. Look for a biased way for genes that you know are true positives, if they are changing in your conditions or not? If they seem to be changing and variable based on any standard visualization, you know a trend. Farther you can go on with a DEA to find DEGs, however still take those results with grain of salt. Do not be overwhelmed by those results unless some gold standard genes are there or you have validated the top hits of DEGs.
  2. Now if its just similarity then you must have a catalog/signature marker genes that should be conferring that the case vs control should not be changing much and you should visualize them. Track for divergence as others pointed out. Hope I could shed some light.

Good luck!

ADD COMMENT
0
Entering edit mode

Thanks for the feedback ivivek_ngs. I will take into account what you said.

ADD REPLY

Login before adding your answer.

Traffic: 1816 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6