Dear community,
my supervisor has asked me to analyse bulk RNAseq. The samples were acquired at different times in different experiments without shared controls, so no correction for batch effects is possible and differential gene expression analysis is unreliable.
We are discussing the analyses that we can do considering the imperfect situation. One thing that seems to be possible is to perform within-sample comparions, by calculating TPMs and comparing expression of GeneX to GeneY.
Now, why can't we take the ratio of GeneX / GeneY and compare these ratios between SampleA and SampleB?
I have a hard time justifying why I think this is a bad idea, because I share the intuition. My "hunch" is that that by calculating the ratio, we are basically performing count normalization, but with just GeneY instead of an "average" over all genes as with DeSeq2 / limma's TMM. The comparison of the ratios is therefore equivalent to a differential gene expression analysis, which we previously established is flawed, using normalization on a single gene. However, this does not make batch effects go away, they're just ignored. Thus, this approach is only "valid" if GeneX and GeneY happen to be exempt from any biases. Would you agree with this?
Somehow I have a hole in my thinking going from "TPMs are okay intra-sample" to "comparing ratios of valid intra-sample TPMs across samples is invalid". Doesn't this by proxy imply that performing intra-sample TPM comparisons is kind of moot anyway because the information gained cannot be put into context?
Maybe someone can help me nudge my train of thought in a productive direction.
Thanks!