the analysis of multiple samples of 10X scRNA-seq
1
2
Entering edit mode
5.1 years ago
Bogdan ★ 1.4k

Dear all, greetings

i'd like to ask you for a piece of advise please : we have 3 scRNA-seq samples that were sequenced at different depths (200 mil reads, or 800 mil reads, 900 mil reads), and consequently, we do see :

-- distinct numbers of cells, and

-- (on average) distinct number of genes/cell, depending on the sample

would the integration of these samples with CELLRANGER AGGR be a good approach (it does normalize the samples too), followed by standard analysis of the AGGREGATED SAMPLES with SEURAT, or SimpleSingleCell pipeline ?

thank you very much,

-- bogdan

scRNAseq scRNA-seq • 5.9k views
ADD COMMENT
9
Entering edit mode
5.1 years ago

I would:

Option 1:

Independently quantify genes of each sample. --> Normalize to 10,000 reads per cell (Default in most scRNA analysis) / ScTransform --> Transform the matrix to square root (instead of log2(counts+1) --> Merge the three matrices (cbind)--> Remove genes that are lowly expressed in less than 1 or 2 or 5% of the cells --> Use combat to remove batch effects (here three batches) --> import the matrix to Seurat --> Skip Normalisation --> PCA/UMAPClustering etc... I am pretty sure the cells will be clustered by cell-types rather than samples.

If you want to see gene expression changes across clusters, I would introduce an extra step of imputation. So It would be:

Independently quantify genes of each sample. --> Normalize to 10,000 reads per cell (Default in most scRNA analysis) --> Transform the matrix to square root (instead of log2(counts+1) / ScTransform--> Merge the three matrices --> Remove genes that are lowly expressed in less than 1 or 2 or 5% of the cells --> Use combat to remove batch effects (here three batches) --> Impute gene expression (For example MAGIC) --> import the matrix to Seurat --> Skip Normalisation --> PCA/UMAPClustering etc...

Take average of gene expression for each cluster and calculate a cluster specificity score (Tau Score for example) and them take genes with Tau score more than 0.5 or 0.3 and Perform K-means clustering of averaged gene expression across clusters to pick markers.

Option 2: Use Seurat (v3) CCA analysis to integrate datasets. Straightforward. It performs SCTransform instead of Library size normalisation, which seems to be better for scRNA data but it depends on end goal.

Option 3: Or if you want to use seurat default differential analysis, start with raw counts but use the knn graph from above analysis and proceed with typical marker analysis or differential gene expression analysis.

Its all custom analysis but works pretty well and its fun.

ADD COMMENT
0
Entering edit mode

thanks a lot for the very detailed suggestions !

ADD REPLY
0
Entering edit mode

About the meaning of batches, if I have 3 groups (4 normal, 8 disease, and 8 treatment, 20 samples in toal), does it mean I have 20 batches? Thank you

ADD REPLY
0
Entering edit mode

Most likely yes.

ADD REPLY

Login before adding your answer.

Traffic: 2621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6