I have three datasets of three different tissues coming from the same donor which were processed the same and sequenced on the same platform. I am interested in exploring potential differences in the cell subsets and their expression profiles in these different tissues. Following the various scRNA-seq analysis tutorials, working with multiple datasets usually calls for integration. However, if I understand correctly, the point of integrating different datasets is to eliminate batch effects depending on the covariate.
I was wondering whether it makes sense at all in my case to perform data integration since each “batch” that I have is a different tissue - so a covariate that I do not want to eliminate since I am interested in finding variations arising from that. At the same time, I understand that despite a similar protocol and same sequencing platform, there still could be unwanted technical variations. Though, if I do integrate these datasets, biological variations (due to tissue differences) would automatically also be eliminated, correct?
What would be the correct workflow in a scenario like this? Should I first perform the analysis without integration and unless I clearly see tissue-specific clusters, I do not need to integrate at all?