I would like to do differential expression between cells expressing a specific gene and the ones not expressing it on scanpy. I cannot figure out how to do it. I know I can filter for the gene like this but that just removes all the cells not expressing it.
My terminology for python and scanPy may be a little off here, but I think you could try maybe adding an observation annotation based on whether the gene is expressed or not expressed and then do differential gene expression using groupby. This is with a huge assist from ChatGPT 3.5 but check it out:
This link served as an inspiration for question to ask ChatGPT:
Certainly! Here's the whole script incorporating the steps to add the
custom observation and perform differential gene expression analysis:
import scanpy as sc
# Assuming you have already loaded your AnnData object, which contains gene expression data
adata = ... # Load or create your AnnData object
# 1. Define the gene of interest
gene_of_interest = 'your_gene' # Specify the gene of interest
# 2. Determine expression status of the gene for each cell
# For demonstration purposes, let's assume your AnnData object contains gene expression in the .X attribute
is_expressed = adata[:, gene_of_interest].X > 0
# 3. Convert the expression status to custom strings
expression_status = ['cells_where_gene_is_expressed' if expr else 'cells_where_gene_is_NOT_expressed' for expr in is_expressed]
# 4. Add the custom observation to the AnnData object
adata.obs['gene_expression_status'] = expression_status
# 5. Perform differential gene expression analysis based on the custom observation
sc.tl.rank_genes_groups(adata, groupby='gene_expression_status', method='t-test')
# Access the results of differential gene expression analysis
results = adata.uns['rank_genes_groups']
# Now you can access the differentially expressed genes for each group
# results['names'] contains the names of differentially expressed genes
# results['logfoldchanges'] contains the log fold changes
# results['pvals'] contains the p-values, etc.
Replace 'your_gene' with the gene you are interested in. Adjust the
observation name 'gene_expression_status' as needed. After running
this script, you'll have performed differential gene expression
analysis based on the custom observation 'gene_expression_status',
comparing the gene expression profiles between cells where the gene is
expressed and cells where it is not expressed.
This is me now: (You can also change the method='t-test' to other tests.)