Question

Tutorial:Single-cell RNA-seq: Annotation: Celltype auto annotation with SCSA-2

0

Entering edit mode

18 months ago

Julia Ma ▴ 140

Part-1 here: Single-cell RNA-seq: Annotation: Celltype auto annotation with SCSA

Part-3 here: Single-cell RNA-seq: Annotation: Celltype auto annotation with SCSA-3

Full article lifted from: https://omicverse.readthedocs.io/en/latest/Tutorials-single/t_cellanno/

We can also use panglaodb as target to annotate the celltype

scsa=ov.single.pySCSA(adata=adata,
                          foldchange=1.5,
                          pvalue=0.01,
                          celltype='normal',
                          target='panglaodb',
                          tissue='All',
                          model_path='temp/pySCSA_2023_v2_plus.db'

)

res=scsa.cell_anno(clustertype='leiden',
               cluster='all',rank_rep=True)

ranking genes
    finished (0:00:01)
...Auto annotate cell
Version V2.1 [2023/06/27]
DB load: GO_items:47347,Human_GO:3,Mouse_GO:3,
CellMarkers:82887,CancerSEA:1574,PanglaoDB:24223
Ensembl_HGNC:61541,Ensembl_Mouse:55414
<omicverse.single._SCSA.Annotator object at 0x2c89d0250>
Version V2.1 [2023/06/27]
DB load: GO_items:47347,Human_GO:3,Mouse_GO:3,
CellMarkers:82887,CancerSEA:1574,PanglaoDB:24223
Ensembl_HGNC:61541,Ensembl_Mouse:55414
load markers: 70276
Cluster 0 Gene number: 75
Other Gene number: 701
Cluster 1 Gene number: 144
Other Gene number: 672
Cluster 10 Gene number: 123
Other Gene number: 675
Cluster 2 Gene number: 515
Other Gene number: 667
Cluster 3 Gene number: 129
Other Gene number: 664
Cluster 4 Gene number: 82
Other Gene number: 701
Cluster 5 Gene number: 931
Other Gene number: 612
Cluster 6 Gene number: 243
Other Gene number: 663
Cluster 7 Gene number: 5
Other Gene number: 712
Cluster 8 Gene number: 67
Other Gene number: 712
Cluster 9 Gene number: 587
Other Gene number: 670
#Cluster Type Celltype Score Times
['0', '?', 'T Cells|T Memory Cells', '3.7178236936447595|3.363511285637478', 1.1053400384057666]
['1', '?', 'T Cells|T Memory Cells', '3.531461316580769|3.117745502299072', 1.1326971088488835]
['10', 'Good', 'Platelets', 7.432982147841601, 2.4362620739591723]
['2', '?', 'Monocytes|Alveolar Macrophages', '3.6455690759322272|2.922079559631339', 1.2475940512694892]
['3', '?', 'B Cells Naive|B Cells Memory', '4.328429443730031|3.9545790694449328', 1.0945360726691886]
['4', '?', 'NK Cells|T Cells', '2.96702571977588|2.680719557955109', 1.1068019819421802]
['5', '?', 'Monocytes|Macrophages', '3.774802905873644|2.8442136919017464', 1.327186813220659]
['6', '?', 'NK Cells|Decidual Cells', '4.137392815321161|2.880539867188994', 1.4363254827501069]
['7', '?', 'Decidual Cells|NK Cells', '1.6543443520623498|1.6543443520623498', 1.0]
['8', '?', 'Monocytes|Alveolar Macrophages', '2.7222000722461677|2.1616279260313114', 1.2593286936499064]
['9', '?', 'Dendritic Cells|Langerhans Cells', '4.146171305699119|3.728571260268892', 1.1120000172398774]

We can query only the better annotated results

scsa.cell_anno_print()

Cluster:0   Cell_type:T Cells|T Memory Cells    Z-score:3.718|3.364
Cluster:1   Cell_type:T Cells|T Memory Cells    Z-score:3.531|3.118
Cluster:2   Cell_type:Monocytes|Alveolar Macrophages    Z-score:3.646|2.922
Cluster:3   Cell_type:B Cells Naive|B Cells Memory  Z-score:4.328|3.955
Cluster:4   Cell_type:NK Cells|T Cells  Z-score:2.967|2.681
Cluster:5   Cell_type:Monocytes|Macrophages Z-score:3.775|2.844
Cluster:6   Cell_type:NK Cells|Decidual Cells   Z-score:4.137|2.881
Cluster:7   Cell_type:Decidual Cells|NK Cells   Z-score:1.654|1.654
Cluster:8   Cell_type:Monocytes|Alveolar Macrophages    Z-score:2.722|2.162
Cluster:9   Cell_type:Dendritic Cells|Langerhans Cells  Z-score:4.146|3.729
Nice:Cluster:10 Cell_type:Platelets Z-score:7.433

scsa.cell_auto_anno(adata,key='scsa_celltype_panglaodb')

...cell type added to scsa_celltype_panglaodb on obs of anndata

Here, we introduce the dimensionality reduction visualisation function ov.utils.embedding, which is similar to scanpy.pl.embedding, except that when we set frameon='small', we scale the axes to the bottom-left corner and scale the colourbar to the bottom-right corner.

adata: the anndata object
basis: the visualized embedding stored in adata.obsm
color: the visualized obs/var
legend_loc: the location of legend, if you set None, it will be visualized in right.
frameon: it can be set small, False or None
legend_fontoutline: the outline in the text of legend.
palette: Different categories of colours, we have a number of different colours preset in omicverse, including ov.utils.palette(), ov.utils.red_color, ov.utils.blue_color, ov.utils.green_color, ov. utils.orange_color. The preset colours can help you achieve a more beautiful visualisation.

ov.utils.embedding(adata,
                   basis='X_mde',
                   color=['leiden','scsa_celltype_cellmarker','scsa_celltype_panglaodb'], 
                   legend_loc='on data', 
                   frameon='small',
                   legend_fontoutline=2,
                   palette=ov.utils.palette()[14:],
                  )

enter image description here

If you want to draw stacked histograms of cell type proportions, you first need to colour the groups you intend to draw using ov.utils.embedding. Then use ov.utils.plot_cellproportion to specify the groups you want to plot, and you can see a plot of cell proportions in the different groups

#Randomly designate the first 1000 cells as group B and the rest as group A
adata.obs['group']='A'
adata.obs.loc[adata.obs.index[:1000],'group']='B'
#Colored
ov.utils.embedding(adata,
                   basis='X_mde',
                   color=['group'], 
                   frameon='small',legend_fontoutline=2,
                   palette=ov.utils.red_color,
                  )

enter image description here

ov.utils.plot_cellproportion(adata=adata,celltype_clusters='scsa_celltype_cellmarker',
                    visual_clusters='group',
                    visual_name='group',figsize=(2,4))



(<Figure size 160x320 with 1 Axes>,
 <AxesSubplot: xlabel='group', ylabel='Cells per Stage'>)

enter image description here

Of course, we also provide another downscaled visualisation of the graph using ov.utils.plot_embedding_celltype

ov.utils.plot_embedding_celltype(adata,figsize=None,basis='X_mde',
                            celltype_key='scsa_celltype_cellmarker',
                            title='            Cell type',
                            celltype_range=(2,6),
                            embedding_range=(4,10),)

(<Figure size 480x320 with 2 Axes>,
 [<AxesSubplot: xlabel='X_mde1', ylabel='X_mde2'>, <AxesSubplot: >])

enter image description here

We calculated the ratio of observed to expected cell numbers (Ro/e) for each cluster in different tissues to quantify the tissue preference of each cluster (Guo et al., 2018; Zhang et al., 2018). The expected cell num- bers for each combination of cell clusters and tissues were obtained from the chi-square test. One cluster was identified as being enriched in a specific tissue if Ro/e>1.

The Ro/e function was wrote by Haihao Zhang.

roe=ov.utils.roe(adata,sample_key='group',cell_type_key='scsa_celltype_cellmarker')

chi2: 1.426243142767526, dof: 5, pvalue: 0.9214211744335161
P-value is greater than 0.05, there is no statistical significance




import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(2,4))

transformed_roe = roe.copy()
transformed_roe = transformed_roe.applymap(
    lambda x: '+++' if x >= 2 else ('++' if x >= 1.5 else ('+' if x >= 1 else '+/-')))

sns.heatmap(roe, annot=transformed_roe, cmap='RdBu_r', fmt='', 
            cbar=True, ax=ax,vmin=0.5,vmax=1.5,cbar_kws={'shrink':0.5})
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

plt.xlabel('Group',fontsize=13)
plt.ylabel('Cell type',fontsize=13)
plt.title('Ro/e',fontsize=13)

Text(0.5, 1.0, 'Ro/e')

enter image description here

scRNA-seq SCSA • 774 views

ADD COMMENT • link updated 18 months ago by Ram 45k • written 18 months ago by Julia Ma ▴ 140