I have a theoretical question - what would happen in a UMAP embedding of 3 datasets with variable quality of the same sample type. Two of these datasets were generated using the 10x platform, and another using SmartSeq2. Below are the metrics of the datasets.
Dataset batch num cells med. umi/cell med. gene/cell
10X REP1 4597 12411 3217
10X REP2 5111 7121 1985
SmartSeq REP1 1239 1550093 11305
It is clear that the SmartSeq sample has captured a lot more information compared to the two 10X replicates.
I can imagine that in a UMAP embedding of these datasets, the SmartSeq sample would form defined clusters, whereas the two 10x replicates (owing to a limitation in the data) may form false clusters that do not overlap with the Smart-Seq sample. Am I right in thinking this?
Just to expand on the answer, both Seurat (link) and Scanpy (link) also have integration workflows depending on your preference.