Hi all
For context: I have 4 different datasets of RNA-seq Illumina data (let's call them A, B, control A and control B). I know that cells from condition A and B produce a certain metabolite. The approach will be to determine DE genes in A_vs_controlA and B_vs_controlB and see which are common. However, I only have 2 replicas of control B (which has a different origin compared with control A), however, I believe each of these samples came from a library of several different individuals (which I´m not sure if it is that relevant). I know that statistically, I need to have at least 3 replicas but for several reasons, there is an impossibility of obtaining more data right now.
What are some approaches I can make to make my inferences more "robust"? Should lower the adjusted p-value threshold to be more restrictive? Should I simulate data based on my 2 samples?
Best
Please use google and the search function for unreplicated RNA-seq experiments. This has literally been discussed dozens of times. In short: Your results, no matter how you twist and turn it, will not be reliable since statistics requires replicates. More in the numerous threads you can find online.
How edgeR handles no replicates
Please read the manual before doing things like this. It is not a reliable method. If possible do some more replicates.