There is a question about Seurat
that has been bothering me all the time. I would like to ask your opinions. For the case of multiple samples, In order to eliminate the batch effect, Seurat
clustering will choose to integrate multiple samples, which is no problem. But when searching for markers, it chooses to use the raw data matrix in order to avoid the elimination of differences by data integration.However, when using Heatmap to display markers, integrated data matrix is used to plot, which leads to many differentially found genes that are not different when visualized.The same thing happens when looking for differential genes and subsequent functional analyses, such as GSVA, receptor ligand pairs, etc., should these analyses use raw or integrated data?If the integrated data is used, will it affect the identification of differences, and if the original matrix is used, will there be a batch effect?
Just as a caution, in my experience Seurat MultiCCA over-integrates the data. I have been advised to use fastMNN, that I also find to be a better alternative.
The
fastMNN
method is described in detail in the Bioconductor scRNA-seq bookThere a lot of methods with different performance that varies based on the dataset. See previous discussion for some examples: about batch correction in scRNA-seq