Question

some questions about sample, integration and subcluster

2

Entering edit mode

3.1 years ago

butang ▴ 20

Hi, every teacher, I'm new in scrna-seq and i had read some posts about scrna-seq, but there are several questions which make my confused:

how to qualify a bad sample(not a cell)，and should a bad sample be abandoned?
i test 3 samples for a specific type of tumor from different patients, should be three samples only merged or integrated after merging? In bulk rna-seq, they will be only merged and no need to consider the batch effect, so i don't know whether it will same in scrna-seq.
if i do the integration and then move on with the workflow, but when i need to subcluster a specific cell type(eg. fibroblast), which type of data should i use? using the origin data of sub-sample move on the workflow again? or using the integrated data of sub-sample?

subcluster integration • 990 views

ADD COMMENT • link 3.1 years ago by butang ▴ 20

score 5 · Answer 1 · 2021-11-08

This is assuming Seurat since you didn't specify and it's one of the more popular pieces of software for scRNA-seq analysis.

There are many ways to end up with a bad sample, so it's probably better to think about what you expect from a run. Your illumina run should have roughly the number of reads you aimed for, and before filtering cells you ideally want to see at least 40% of the expected cell number and a decent sequencing saturation value. Most of your cells should have a lower mitochondrial read percentage (less than about 25%), and for 10X runs a higher number of detected features per cell (500-5000 or so), and a decent number of UMIs per cell (1000-20000). Your doublet rate should be less than 10% too. Finally, after processing and dimension reduction you should be able to define clearly the tissues/cell types you expect to separate via scRNA-seq.
Integration is a type of merging that accounts for batch effects (for clustering and dimension reduction), so keep the samples separate until integration.
Any clustering or dimension reduction should be performed on the integrated data values. Any other type of analysis (like finding markers) should be the log normalized values.