Hi, I would like to do differential expression analysis of RNA-seq data between 2 conditions, and I only have 2-3 samples per condition. Do you recommend me to use DESeq2 method for this case, or which one? Would results be reliable? I had more samples but I had to removed them from my analysis because of poor quality.
I would say that's a very common situation for RNA-seq. You can have a look at this paper to have an idea about the accuracy of the methods. (DESeq2 was not developed yet by then, I guess that it would be a bit better than DESeq)
thank you, according to that paper DESeq (and I imagine DESeq 2) have better performance than other methods when using few replicates
My samples are ovarian tissue in different 2 conditions (high and low fertility), and as I said 2-3 replicates per condition
When I try the differential expression analysis between only 2 samples (1vs1), I obtain a smaller list of genes than when I do it between 5 samples (2 vs 3), which makes me thing that in this case using few replicates is not going to give me many false positives, because if not the 1vs1 analysis would give me a large list of unreliable genes, and it doesn't
Basically the more biological replicates the better. The more samples you have, the better you can estimate variance and thus reliably report statistically robust observations. If you used DESeq2 on "1 v 1" samples, then the software will act very conservatively, which seems to suggest that if you're getting statistically significant results, that the biological effect is quite strong. Adding more replicates means that you will likely get more statistically significant results.
The number of false positives should be the same no matter how many reps you use. What changes is the number of true positives and the % of calls that are good. The number of false positives at a given p value should be the p value (i.e. 1% of the genes are falsely called at p<0.01). Your false discovery rate (the % of all your calls which are false positives) will decrease with a greater number of replicates because you will be discovering more true positives. So if you are calling 300 of 30,000 genes with no replicates at p<0.01 all of your calls are expected to be bad. If you call 600 genes with 2 reps at p<0.01 half of your calls are expected to be good! But the other half is still bad. Of course, you can't really calculate a valid p value with one rep because you are only guessing on what the variance is. Usually you want to choose a p-value restrictive enough so that your expected FDR is around 10% or lower but that is not always possible with low powered experiments.
Thank you very much for your help.
Doing my differential expression analysis between 2 and 3 different animal models, I obtain fewer DEGs than if I do it between 3 and 3 (in this last case I am adding one sample that has regular quality - the others have good quality). My aim is obtaining as many DEGs as possible. Which analysis do you think I should do - which would be more reliable? 2 vs 3 of good quality and more DEGs or 3 vs 3 with one of regular quality and fewer DEGs)?