I have single cell datasets including 5 samples from two different tissue (2 blood samples and 3 urothelial samples). After quality control, I got 41070 cells in total.
I tried to integrate these datasets useing CCA, MNN, HARMONY and BBKNN respectively, and then perform dimention reduction (PCA), clustering followed by cell type annotation. When I checked the results, 500-700 (about 5-6%) blood cells were annotated as epithelium mistakenly whatever above integration methods I used. I checked these cells using single sample clustering, I confirmed no epithelium in blood samples.
Is it good to integrate datasets from different tissue samples? If it's ture, how to do it better to get a finer integrated data?
what did you use for cell type predictions? are you performing predictions on individual cells or on whole clusters? If the latter, then the whole cluster will be labeled according to the most prevalent cell type even if other types of cells are in that cluster. Else, what are the scores for epithelium cell types and how do those scores compare to other cell types?