Part-1 here: Single-cell RNA-seq: Annotation: Celltype auto annotation with SCSA
Part-2 here: Single-cell RNA-seq: Annotation: Celltype auto annotation with SCSA-2
Full article lifted from: https://omicverse.readthedocs.io/en/latest/Tutorials-single/t_cellanno/
Cell annotate manually
In order to compare the accuracy of our automatic annotations, we will here use marker genes to manually annotate the cluster and compare the accuracy of the pySCSA and manual.
We need to prepare a marker's dict at first
res_marker_dict={
'Megakaryocyte':['ITGA2B','ITGB3'],
'Dendritic cell':['CLEC10A','IDO1'],
'Monocyte' :['S100A8','S100A9','LST1',],
'Macrophage':['CSF1R','CD68'],
'B cell':['MS4A1','CD79A','MZB1',],
'NK/NKT cell':['GNLY','KLRD1'],
'CD8+T cell':['CD8A','CD8B'],
'Treg':['CD4','CD40LG','IL7R','FOXP3','IL2RA'],
'CD4+T cell':['PTPRC','CD3D','CD3E'],
}
We then calculated the expression of marker genes in each cluster and the fraction
sc.tl.dendrogram(adata,'leiden')
sc.pl.dotplot(adata, res_marker_dict, 'leiden',
dendrogram=True,standard_scale='var')
WARNING: You’re trying to run this on 2000 dimensions of `.X`, if you really want this, set `use_rep='X'`.
Falling back to preprocessing with `sc.pp.pca` and default params.
computing PCA
with n_comps=50
finished (0:00:00)
Storing dendrogram info using `.uns['dendrogram_leiden']`
WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: 0, 1, 2, etc.
var_group_labels: Megakaryocyte, Dendritic cell, Monocyte, etc.
Based on the dotplot, we name each cluster according ov.single.scanpy_cellanno_from_dict
# create a dictionary to map cluster to annotation label
cluster2annotation = {
'0': 'T cell',
'1': 'T cell',
'2': 'Monocyte',#Germ-cell(Oid)
'3': 'B cell',#Germ-cell(Oid)
'4': 'T cell',
'5': 'Macrophage',
'6': 'NKT cells',
'7': 'T cell',
'8':'Monocyte',
'9':'Dendritic cell',
'10':'Megakaryocyte',
}
ov.single.scanpy_cellanno_from_dict(adata,anno_dict=cluster2annotation,
clustertype='leiden')
...cell type added to major_celltype on obs of anndata
Compare the pySCSA and Manual
We can see that the auto-annotation results are almost identical to the manual annotation, the only difference is between monocyte and macrophages, but in the previous auto-annotation results, pySCSA gives the option of monocyte|macrophage, so it can be assumed that pySCSA performs better on the pbmc3k data
ov.utils.embedding(adata,
basis='X_mde',
color=['major_celltype','scsa_celltype_cellmarker'],
legend_loc='on data', frameon='small',legend_fontoutline=2,
palette=ov.utils.palette()[14:],
)
We can use get_celltype_marker
to obtain the marker of each celltype
marker_dict=ov.single.get_celltype_marker(adata,clustertype='scsa_celltype_cellmarker')
marker_dict.keys()
...get cell type marker
ranking genes
finished (0:00:01)
dict_keys(['B cell', 'Dendritic cell', 'Megakaryocyte', 'Monocyte', 'Natural killer cell', 'T cell'])
marker_dict['B cell']
array(['CD74', 'CD79A', 'HLA-DRA', 'CD79B', 'HLA-DPB1', 'HLA-DQA1',
'MS4A1', 'HLA-DQB1', 'HLA-DRB1', 'CD37', 'HLA-DPA1', 'HLA-DRB5',
'TCL1A'], dtype=object)
The tissue name in database
For annotation of cell types in specific tissues, we can query the tissues available in the database using get_model_tissue.
scsa.get_model_tissue()
Version V2.1 [2023/06/27]
DB load: GO_items:47347,Human_GO:3,Mouse_GO:3,
CellMarkers:82887,CancerSEA:1574,PanglaoDB:24223
Ensembl_HGNC:61541,Ensembl_Mouse:55414
########################################################################################################################
------------------------------------------------------------------------------------------------------------------------
Species:Human Num:298
------------------------------------------------------------------------------------------------------------------------
1: Abdomen 2: Abdominal adipose tissue 3: Abdominal fat pad
4: Acinus 5: Adipose tissue 6: Adrenal gland
7: Adventitia 8: Airway 9: Airway epithelium
10: Allocortex 11: Alveolus 12: Amniotic fluid
13: Amniotic membrane 14: Ampullary 15: Anogenital tract
16: Antecubital vein 17: Anterior cruciate ligament 18: Anterior presomitic mesoderm
19: Aorta 20: Aortic valve 21: Artery
22: Arthrosis 23: Articular Cartilage 24: Ascites
25: Ascitic fluid 26: Atrium 27: Basal airway
28: Basilar membrane 29: Beige Fat 30: Bile duct
31: Biliary tract 32: Bladder 33: Blood
34: Blood vessel 35: Bone 36: Bone marrow
37: Brain 38: Breast 39: Bronchial vessel
40: Bronchiole 41: Bronchoalveolar lavage 42: Bronchoalveolar system
43: Bronchus 44: Brown adipose tissue 45: Calvaria
46: Capillary 47: Cardiac atrium 48: Cardiovascular system
49: Carotid artery 50: Carotid plaque 51: Cartilage
52: Caudal cortex 53: Caudal forebrain 54: Caudal ganglionic eminence
55: Cavernosum 56: Central amygdala 57: Central nervous system
58: Cerebellum 59: Cerebral organoid 60: Cerebrospinal fluid
61: Cervix 62: Choriocapillaris 63: Chorionic villi
64: Chorionic villus 65: Choroid 66: Choroid plexus
67: Colon 68: Colon epithelium 69: Colorectum
70: Cornea 71: Corneal endothelium 72: Corneal epithelium
73: Coronary artery 74: Corpus callosum 75: Corpus luteum
76: Cortex 77: Cortical layer 78: Cortical thymus
79: Decidua 80: Deciduous tooth 81: Dental pulp
82: Dermis 83: Diencephalon 84: Distal airway
85: Dorsal forebrain 86: Dorsal root ganglion 87: Dorsolateral prefrontal cortex
88: Ductal tissue 89: Duodenum 90: Ectocervix
91: Ectoderm 92: Embryo 93: Embryoid body
94: Embryonic Kidney 95: Embryonic brain 96: Embryonic heart
97: Embryonic prefrontal cortex 98: Embryonic stem cell 99: Endocardium
100: Endocrine 101: Endoderm 102: Endometrium
103: Endometrium stroma 104: Entorhinal cortex 105: Epidermis
106: Epithelium 107: Esophageal 108: Esophagus
109: Eye 110: Fat pad 111: Fetal brain
...