Part-1 here: Single-cell RNA-seq: Annotation: Celltype auto annotation with SCSA
Part-3 here: Single-cell RNA-seq: Annotation: Celltype auto annotation with SCSA-3
Full article lifted from: https://omicverse.readthedocs.io/en/latest/Tutorials-single/t_cellanno/
We can also use panglaodb as target to annotate the celltype
scsa=ov.single.pySCSA(adata=adata,
foldchange=1.5,
pvalue=0.01,
celltype='normal',
target='panglaodb',
tissue='All',
model_path='temp/pySCSA_2023_v2_plus.db'
)
res=scsa.cell_anno(clustertype='leiden',
cluster='all',rank_rep=True)
ranking genes
finished (0:00:01)
...Auto annotate cell
Version V2.1 [2023/06/27]
DB load: GO_items:47347,Human_GO:3,Mouse_GO:3,
CellMarkers:82887,CancerSEA:1574,PanglaoDB:24223
Ensembl_HGNC:61541,Ensembl_Mouse:55414
<omicverse.single._SCSA.Annotator object at 0x2c89d0250>
Version V2.1 [2023/06/27]
DB load: GO_items:47347,Human_GO:3,Mouse_GO:3,
CellMarkers:82887,CancerSEA:1574,PanglaoDB:24223
Ensembl_HGNC:61541,Ensembl_Mouse:55414
load markers: 70276
Cluster 0 Gene number: 75
Other Gene number: 701
Cluster 1 Gene number: 144
Other Gene number: 672
Cluster 10 Gene number: 123
Other Gene number: 675
Cluster 2 Gene number: 515
Other Gene number: 667
Cluster 3 Gene number: 129
Other Gene number: 664
Cluster 4 Gene number: 82
Other Gene number: 701
Cluster 5 Gene number: 931
Other Gene number: 612
Cluster 6 Gene number: 243
Other Gene number: 663
Cluster 7 Gene number: 5
Other Gene number: 712
Cluster 8 Gene number: 67
Other Gene number: 712
Cluster 9 Gene number: 587
Other Gene number: 670
#Cluster Type Celltype Score Times
['0', '?', 'T Cells|T Memory Cells', '3.7178236936447595|3.363511285637478', 1.1053400384057666]
['1', '?', 'T Cells|T Memory Cells', '3.531461316580769|3.117745502299072', 1.1326971088488835]
['10', 'Good', 'Platelets', 7.432982147841601, 2.4362620739591723]
['2', '?', 'Monocytes|Alveolar Macrophages', '3.6455690759322272|2.922079559631339', 1.2475940512694892]
['3', '?', 'B Cells Naive|B Cells Memory', '4.328429443730031|3.9545790694449328', 1.0945360726691886]
['4', '?', 'NK Cells|T Cells', '2.96702571977588|2.680719557955109', 1.1068019819421802]
['5', '?', 'Monocytes|Macrophages', '3.774802905873644|2.8442136919017464', 1.327186813220659]
['6', '?', 'NK Cells|Decidual Cells', '4.137392815321161|2.880539867188994', 1.4363254827501069]
['7', '?', 'Decidual Cells|NK Cells', '1.6543443520623498|1.6543443520623498', 1.0]
['8', '?', 'Monocytes|Alveolar Macrophages', '2.7222000722461677|2.1616279260313114', 1.2593286936499064]
['9', '?', 'Dendritic Cells|Langerhans Cells', '4.146171305699119|3.728571260268892', 1.1120000172398774]
We can query only the better annotated results
scsa.cell_anno_print()
Cluster:0 Cell_type:T Cells|T Memory Cells Z-score:3.718|3.364
Cluster:1 Cell_type:T Cells|T Memory Cells Z-score:3.531|3.118
Cluster:2 Cell_type:Monocytes|Alveolar Macrophages Z-score:3.646|2.922
Cluster:3 Cell_type:B Cells Naive|B Cells Memory Z-score:4.328|3.955
Cluster:4 Cell_type:NK Cells|T Cells Z-score:2.967|2.681
Cluster:5 Cell_type:Monocytes|Macrophages Z-score:3.775|2.844
Cluster:6 Cell_type:NK Cells|Decidual Cells Z-score:4.137|2.881
Cluster:7 Cell_type:Decidual Cells|NK Cells Z-score:1.654|1.654
Cluster:8 Cell_type:Monocytes|Alveolar Macrophages Z-score:2.722|2.162
Cluster:9 Cell_type:Dendritic Cells|Langerhans Cells Z-score:4.146|3.729
Nice:Cluster:10 Cell_type:Platelets Z-score:7.433
scsa.cell_auto_anno(adata,key='scsa_celltype_panglaodb')
...cell type added to scsa_celltype_panglaodb on obs of anndata
Here, we introduce the dimensionality reduction visualisation function ov.utils.embedding
, which is similar to scanpy.pl.embedding
, except that when we set frameon='small'
, we scale the axes to the bottom-left corner and scale the colourbar to the bottom-right corner.
adata
: the anndata objectbasis
: the visualized embedding stored in adata.obsmcolor
: the visualized obs/varlegend_loc
: the location of legend, if you set None, it will be visualized in right.frameon
: it can be set small, False or Nonelegend_fontoutline
: the outline in the text of legend.palette
: Different categories of colours, we have a number of different colours preset in omicverse, includingov.utils.palette()
,ov.utils.red_color
,ov.utils.blue_color
,ov.utils.green_color
,ov. utils.orange_color
. The preset colours can help you achieve a more beautiful visualisation.
ov.utils.embedding(adata,
basis='X_mde',
color=['leiden','scsa_celltype_cellmarker','scsa_celltype_panglaodb'],
legend_loc='on data',
frameon='small',
legend_fontoutline=2,
palette=ov.utils.palette()[14:],
)
If you want to draw stacked histograms of cell type proportions, you first need to colour the groups you intend to draw using ov.utils.embedding
. Then use ov.utils.plot_cellproportion
to specify the groups you want to plot, and you can see a plot of cell proportions in the different groups
#Randomly designate the first 1000 cells as group B and the rest as group A
adata.obs['group']='A'
adata.obs.loc[adata.obs.index[:1000],'group']='B'
#Colored
ov.utils.embedding(adata,
basis='X_mde',
color=['group'],
frameon='small',legend_fontoutline=2,
palette=ov.utils.red_color,
)
ov.utils.plot_cellproportion(adata=adata,celltype_clusters='scsa_celltype_cellmarker',
visual_clusters='group',
visual_name='group',figsize=(2,4))
(<Figure size 160x320 with 1 Axes>,
<AxesSubplot: xlabel='group', ylabel='Cells per Stage'>)
Of course, we also provide another downscaled visualisation of the graph using ov.utils.plot_embedding_celltype
ov.utils.plot_embedding_celltype(adata,figsize=None,basis='X_mde',
celltype_key='scsa_celltype_cellmarker',
title=' Cell type',
celltype_range=(2,6),
embedding_range=(4,10),)
(<Figure size 480x320 with 2 Axes>,
[<AxesSubplot: xlabel='X_mde1', ylabel='X_mde2'>, <AxesSubplot: >])
We calculated the ratio of observed to expected cell numbers (Ro/e) for each cluster in different tissues to quantify the tissue preference of each cluster (Guo et al., 2018; Zhang et al., 2018). The expected cell num- bers for each combination of cell clusters and tissues were obtained from the chi-square test. One cluster was identified as being enriched in a specific tissue if Ro/e>1.
The Ro/e function was wrote by Haihao Zhang.
roe=ov.utils.roe(adata,sample_key='group',cell_type_key='scsa_celltype_cellmarker')
chi2: 1.426243142767526, dof: 5, pvalue: 0.9214211744335161
P-value is greater than 0.05, there is no statistical significance
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(2,4))
transformed_roe = roe.copy()
transformed_roe = transformed_roe.applymap(
lambda x: '+++' if x >= 2 else ('++' if x >= 1.5 else ('+' if x >= 1 else '+/-')))
sns.heatmap(roe, annot=transformed_roe, cmap='RdBu_r', fmt='',
cbar=True, ax=ax,vmin=0.5,vmax=1.5,cbar_kws={'shrink':0.5})
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.xlabel('Group',fontsize=13)
plt.ylabel('Cell type',fontsize=13)
plt.title('Ro/e',fontsize=13)
Text(0.5, 1.0, 'Ro/e')