Entering edit mode
17 months ago
nicola.calonaci
•
0
Suppose I have two bulk RNAseq datasets that I want to compare. The two datasets come from independent experiments performed on two different groups of patients, respectively with total RNA and poly(A) selection protocols. I performed PCA on the former dataset and want to project data from the latter on the obtained PCA space. How do I normalise data so that the two are comparable in terms of PCA (i.e. they do not split in PCA space because of experiment or protocol bias?
This is usually not possible. There is no way to know what is biological and what is the technical (batch) difference between unrelated studies. I suggest you avoid this sort of analysis and go for studies that allow a proper comparison, meaning the groups you want to compare must be in the same dataset, same lab, same protocol, same everything. You're just chasing ghosts when working with these sorts of confounded data. Nothing to be gained from it, really.
No (some more words to avoid an error while submitting this comment)