Question

the analysis of multiple samples of 10X scRNA-seq

2

Entering edit mode

5.9 years ago

Bogdan ★ 1.4k

Dear all, greetings

i'd like to ask you for a piece of advise please : we have 3 scRNA-seq samples that were sequenced at different depths (200 mil reads, or 800 mil reads, 900 mil reads), and consequently, we do see :

-- distinct numbers of cells, and

-- (on average) distinct number of genes/cell, depending on the sample

would the integration of these samples with CELLRANGER AGGR be a good approach (it does normalize the samples too), followed by standard analysis of the AGGREGATED SAMPLES with SEURAT, or SimpleSingleCell pipeline ?

thank you very much,

-- bogdan

scRNAseq scRNA-seq • 6.4k views

ADD COMMENT • link updated 4.4 years ago by ATpoint 89k • written 5.9 years ago by Bogdan ★ 1.4k

score 9 · Accepted Answer · 2019-10-14

I would:

Option 1:

Independently quantify genes of each sample. --> Normalize to 10,000 reads per cell (Default in most scRNA analysis) / ScTransform --> Transform the matrix to square root (instead of log2(counts+1) --> Merge the three matrices (cbind)--> Remove genes that are lowly expressed in less than 1 or 2 or 5% of the cells --> Use combat to remove batch effects (here three batches) --> import the matrix to Seurat --> Skip Normalisation --> PCA/UMAPClustering etc... I am pretty sure the cells will be clustered by cell-types rather than samples.

If you want to see gene expression changes across clusters, I would introduce an extra step of imputation. So It would be:

Independently quantify genes of each sample. --> Normalize to 10,000 reads per cell (Default in most scRNA analysis) --> Transform the matrix to square root (instead of log2(counts+1) / ScTransform--> Merge the three matrices --> Remove genes that are lowly expressed in less than 1 or 2 or 5% of the cells --> Use combat to remove batch effects (here three batches) --> Impute gene expression (For example MAGIC) --> import the matrix to Seurat --> Skip Normalisation --> PCA/UMAPClustering etc...

Take average of gene expression for each cluster and calculate a cluster specificity score (Tau Score for example) and them take genes with Tau score more than 0.5 or 0.3 and Perform K-means clustering of averaged gene expression across clusters to pick markers.

Option 2: Use Seurat (v3) CCA analysis to integrate datasets. Straightforward. It performs SCTransform instead of Library size normalisation, which seems to be better for scRNA data but it depends on end goal.

Option 3: Or if you want to use seurat default differential analysis, start with raw counts but use the knn graph from above analysis and proceed with typical marker analysis or differential gene expression analysis.

Its all custom analysis but works pretty well and its fun.