Hello,
I'm currently analyzing an RNA-seq experiment consisting of clinical patient samples pre- and post-treatment, for individuals that had no response (NR, n=6), partial response (PR, n=4), or complete response (CR, n=2) to our compound. Unfortunately, no replicates were collected for each individual patient, but we're doing the best we can with these samples. The goal is hypothesis generation for downstream validation. Our main questions are:
1) Which genes consistently change expression after treatment?
2) Which genes change specifically in CR/PR patients and are unchanged in NR patients after treatment?
I'm trying to determine the best way to analyze these data with these limited resources. I've analyzed the pre- and post-treatment samples with CuffDiff and DESeq2, and have markedly different results. I'm currently trying to analyze them with IsoEM2/IsoDE2 as these perform bootstrapping to report confidence intervals and were designed for an experiment without replicates. Do you have any insight on which of these programs (or a different one) that would be best suited for an experiment without replicates? There doesn't seem to be any consensus in the literature, so I was hoping for any input.
Ultimately, I plan on calling differentially expressed genes by pooling the two CR, four PR, and six NR patients as "biological replicates" to determine genes that change within each group, then looking at the fold change of these genes within each individual patient. Does this sound like a reasonable approach?
I've been wondering if there is a reasonable way to analyze each of these patients individually, then find which genes are consistently differentially expressed. I'm hesitant to put any faith into the reported p-values from DE programs, as there are no replicates. Would it be reasonable to use expression (minimum FPKM cutoff) and log2-fold change to call "putative differentially expressed genes" in each patient, then examine the overlap? Or am I opening a can of worms with this line of thinking?
Thank you very much for the help, this is a wonderful community!
Some info: Advice on RNASeq analysis without biological replicates for differential gene expression.
I have essentially the same characteristics in the dataset which I'm currently analysing. Using the different patients with the same clinical outcome as replicates doesn't work because there's a lot of heterogeneity between them, so no genes are found to be significantly differentially expressed. That kind of analysis only really works for small experiments using cell lines. Using a method such as GFold, also recommended by the advice linked to in the other comment, is a feasible approach to get some rankings for each pair of samples belonging to a patient. Once you show those to a biologist, it'll be apparent that different patients have different mechanisms of resistance, demonstrating why treating the different patients as biological replicates is not viable.