Question

DESeq2 for differential expression analysis of RNA-seq datawith few samples per condition?

0

Entering edit mode

9.1 years ago

amyfm ▴ 10

Hi, I would like to do differential expression analysis of RNA-seq data between 2 conditions, and I only have 2-3 samples per condition. Do you recommend me to use DESeq2 method for this case, or which one? Would results be reliable? I had more samples but I had to removed them from my analysis because of poor quality.

RNA-Seq deseq2 samples replicates • 4.3k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.1 years ago by amyfm ▴ 10

0

Entering edit mode

I would say that's a very common situation for RNA-seq. You can have a look at this paper to have an idea about the accuracy of the methods. (DESeq2 was not developed yet by then, I guess that it would be a bit better than DESeq)

ADD REPLY • link 9.1 years ago by Martombo ★ 3.1k

0

Entering edit mode

thank you, according to that paper DESeq (and I imagine DESeq 2) have better performance than other methods when using few replicates

ADD REPLY • link 9.1 years ago by amyfm ▴ 10

0

Entering edit mode

My samples are ovarian tissue in different 2 conditions (high and low fertility), and as I said 2-3 replicates per condition

When I try the differential expression analysis between only 2 samples (1vs1), I obtain a smaller list of genes than when I do it between 5 samples (2 vs 3), which makes me thing that in this case using few replicates is not going to give me many false positives, because if not the 1vs1 analysis would give me a large list of unreliable genes, and it doesn't

ADD REPLY • link 9.1 years ago by amyfm ▴ 10

0

Entering edit mode

Basically the more biological replicates the better. The more samples you have, the better you can estimate variance and thus reliably report statistically robust observations. If you used DESeq2 on "1 v 1" samples, then the software will act very conservatively, which seems to suggest that if you're getting statistically significant results, that the biological effect is quite strong. Adding more replicates means that you will likely get more statistically significant results.

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.1 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

The number of false positives should be the same no matter how many reps you use. What changes is the number of true positives and the % of calls that are good. The number of false positives at a given p value should be the p value (i.e. 1% of the genes are falsely called at p<0.01). Your false discovery rate (the % of all your calls which are false positives) will decrease with a greater number of replicates because you will be discovering more true positives. So if you are calling 300 of 30,000 genes with no replicates at p<0.01 all of your calls are expected to be bad. If you call 600 genes with 2 reps at p<0.01 half of your calls are expected to be good! But the other half is still bad. Of course, you can't really calculate a valid p value with one rep because you are only guessing on what the variance is. Usually you want to choose a p-value restrictive enough so that your expected FDR is around 10% or lower but that is not always possible with low powered experiments.

ADD REPLY • link 9.1 years ago by Michele Busby ★ 2.2k

0

Entering edit mode

Thank you very much for your help.

Doing my differential expression analysis between 2 and 3 different animal models, I obtain fewer DEGs than if I do it between 3 and 3 (in this last case I am adding one sample that has regular quality - the others have good quality). My aim is obtaining as many DEGs as possible. Which analysis do you think I should do - which would be more reliable? 2 vs 3 of good quality and more DEGs or 3 vs 3 with one of regular quality and fewer DEGs)?

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.1 years ago by amyfm ▴ 10

Ram · Answer 1 · 2015-10-22

1

Entering edit mode

9.1 years ago

andrew.j.skelton73 6.6k

Depends on what the samples are. You can still run them through DESeq2, you have biological replicates, it just may be underpowered depending on what you're looking for.

DESeq2 is good for gene level analysis. Salmon/Kallisto + Sleuth for transcripts.

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.1 years ago by andrew.j.skelton73 6.6k

Ram · Answer 2 · 2015-10-27

0

Entering edit mode

9.1 years ago

marina.kimyr ▴ 20

I have published papers where we only had 3 replicates per condition in a control / condition setting. As long as you have more than 2 replicates, technically you can use DESeq2 for differential analysis and meanwhile you may discuss the lack of replicate in your paper if you feel it worth to mention anything about it.

Good luck

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.1 years ago by marina.kimyr ▴ 20

0

Entering edit mode

Thank you very much! With more than 2 replicates, you mean 2 replicates included?

ADD REPLY • link 9.1 years ago by amyfm ▴ 10

0

Entering edit mode

I meant at least 2 replicates for the control and at least 2 replicates for the sample :)

ADD REPLY • link 9.1 years ago by marina.kimyr ▴ 20