I am using find_marker_genes from stereopy (for Stereo-seq spatial data).
I have merged two data objects data1 and data2 into dataM
dataM = st.utils.data_helper.merge(data1, data2)
dataM.tl.raw_checkpoint() dataM.tl.normalize_total() dataM.tl.log1p()
dataM.tl.pca(use_highly_genes=False, n_pcs=30, res_key='pca',
svd_solver='arpack' ) dataM.tl.batches_integrate(pca_res_key='pca',
res_key='pca_integrated')
dataM.tl.neighbors(pca_res_key='pca_integrated', n_pcs=30,
res_key='neighbors_integrated', n_jobs= 8 )
dataM.tl.umap(pca_res_key='pca_integrated',
neighbors_res_key='neighbors_integrated', res_key='umap_integrated')
dataM.plt.batches_umap(res_key='umap_integrated')
dataM.tl.leiden(neighbors_res_key='neighbors_integrated',
res_key='leiden', resolution=0.75)
dataM.plt.cluster_scatter(res_key='leiden')
Update dataM dictionary such that dataM.cells['batch'] and dataM.cells['leiden'] are combined
dataM.cells['batch_leiden_combination'] = dataM.cells['batch'].astype(str) + ':' + dataM.cells['leiden'].astype(str)
dataM
StereoExpData object with n_cells X n_genes = 103076 X 12451
bin_type: bins
bin_size: 50
offset_x = 2841
offset_y = 4296
cells: ['cell_name', 'batch', 'total_counts', 'pct_counts_mt', 'n_genes_by_counts', 'leiden', 'batch_leiden_combination']
genes: ['gene_name']
cells_matrix = ['pca', 'pca_integrated', 'umap_integrated']
cells_pairwise = ['neighbors_integrated']
key_record: {'pca': ['pca', 'pca_integrated'], 'neighbors': ['neighbors_integrated'], 'umap': ['umap_integrated'], 'cluster': ['leiden'], 'marker_genes': ['marker_genes'], 'gene_exp_cluster': ['gene_exp_leiden']}
unique_batch_leiden = dataM.cells['batch_leiden_combination'].unique() print(unique_batch_leiden)
['0:3' '0:1' '0:2' '0:8' '0:5' '0:7' '0:14' '0:19' '0:9' '0:6' '0:4' '0:12' '0:10' '0:13' '0:11' '0:15' '0:18' '0:16' '0:17' '1:2' '1:4' '1:12' '1:9' '1:10' '1:5' '1:3' '1:1' '1:7' '1:6' '1:11' '1:8' '1:13' '1:14' '1:15' '1:16' '1:18' '1:17' '1:19']
I want to run Find_marker_genes between 0:0 and 1:0. Basically same clusters across two groups
Find DE Marker genes across all cluster
dataM.tl.find_marker_genes( cluster_res_key='leiden', method='t_test', use_highly_genes=False, use_raw=True, res_key='marker_genes' )
[2023-10-17 11:40:36][Stereo][4136646][MainThread][140077036418880][st_pipeline][37][INFO]: start to run find_marker_genes... [2023-10-17 11:41:55][Stereo][4136646][MainThread][140077036418880][st_pipeline][40][INFO]: find_marker_genes end, consume time 79.5135s.
It gives the expected result.
Now, let's re-run find_marker_genes by using case_groups and control_groups .
I want to do specifically with my groups of interest by using case_groups and control_groups case_groups (Union[str, ndarray, list]) – case group, default all clusters. control_groups (Union[str, ndarray, list]) – control group, default the rest of groups.
Now I changed cluster_res_key='batch_leiden_combination' since batch_leiden info is stored there
Blockquote
dataM.tl.find_marker_genes( cluster_res_key='batch_leiden_combination', method='t_test', use_highly_genes=False, use_raw=True, res_key='marker_genes', case_groups=['0:0'], control_groups=['0:1'] )
[2023-10-17 11:47:21][Stereo][4136646][MainThread][140077036418880][st_pipeline][37][INFO]: start to run find_marker_genes...
Traceback (most recent call last):
File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'group'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "", line 1, in
File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/stereo/core/st_pipeline.py", line 39, in wrapped res = func(*args, **kwargs)
File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/stereo/core/st_pipeline.py", line 909, in find_marker_genes if self.result[cluster_res_key]['group'].unique().size <= 1:
File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/series.py", line 981, in getitem return self._get_value(key)
File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/series.py", line 1089, in _get_value loc = self.index.get_loc(label)
File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc raise KeyError(key) from err
KeyError: 'group'
Can anyone suggest how to use case_group and control_groups in specific batch_leiden of interest