Question

UMAP of 10X and SmartSeq2 datasets on same plot

0

Entering edit mode

3.8 years ago

a.rex ▴ 350

I have a theoretical question - what would happen in a UMAP embedding of 3 datasets with variable quality of the same sample type. Two of these datasets were generated using the 10x platform, and another using SmartSeq2. Below are the metrics of the datasets.

Dataset        batch           num cells    med. umi/cell    med. gene/cell
              10X REP1             4597        12411         3217
              10X REP2             5111         7121         1985
              SmartSeq REP1        1239       1550093       11305

It is clear that the SmartSeq sample has captured a lot more information compared to the two 10X replicates.

I can imagine that in a UMAP embedding of these datasets, the SmartSeq sample would form defined clusters, whereas the two 10x replicates (owing to a limitation in the data) may form false clusters that do not overlap with the Smart-Seq sample. Am I right in thinking this?

smartseq2 singlecellrna-seq umap 10xgenomics • 2.6k views

ADD COMMENT • link updated 3.8 years ago by ATpoint 86k • written 3.8 years ago by a.rex ▴ 350

score 2 · Answer 1 · 2021-03-22

2

Entering edit mode

3.8 years ago

ATpoint 86k

You have to perform integration to correct for the technical batch effect.

See our holy bible: http://bioconductor.org/books/release/OSCA/integrating-datasets.html

Edit: Or the Seurat / ScanPy analoga as linked below by rpolicastro

ADD COMMENT • link 3.8 years ago by ATpoint 86k

1

Entering edit mode

Just to expand on the answer, both Seurat (link) and Scanpy (link) also have integration workflows depending on your preference.

ADD REPLY • link 3.8 years ago by rpolicastro 13k