How to make a UMAP for single cell data and color cells by average expression of a list of genes in scanpy?
2
0
Entering edit mode
22 months ago
bioinfo ▴ 150

Hello,

I have single cell and bulk RNA seq data for both of which I have performed some basic analysis. For the bulk RNA seq data I have performed DESe2 and I have gotten a list of DE genes. I would like to make a UMAP where the cells are colored by the average expression of the bulk signature genes but I am having trouble doing it. I am working with scanpy.

I have done the below so far:

bulk_de_genes_list = bulk_de_genes['Gene'].tolist()
# Filter the genes
adata2 = adata[:, adata.var_names.isin(bulk_de_genes_list)]

This seems to have worked but I am having issues with the next part. I am not sure if the average expression of the bulk signature genes would be obtained using average_expression = adata2.X.mean(axis=0) or cell_averages = adata2.X.mean(axis=1) so I have tried two things:

First:

average_expression = adata2.X.mean(axis=0)
# Divide the average expression into bins
bins = np.histogram(average_expression, bins='fd')[1]

# Assign a color to each bin
cmap = plt.get_cmap('viridis')
colors = cmap(np.digitize(average_expression, bins) / len(bins))
# Run UMAP
sc.pp.neighbors(adata2, n_neighbors=10)
sc.tl.umap(adata2)
fig, ax = plt.subplots()
sc.pl.umap(adata2, color=colors, cmap='viridis')
plt.show()

This gives me the errors below:

TypeError: unhashable type: 'numpy.ndarray'
ValueError: Image size of 1932x155200 pixels is too large. It must be less than 2^16 in each direction

. Second attempt:

# Calculate the average expression of each signature gene for each cell
cell_averages = adata2.X.mean(axis=1)

# Add the average expression of the bulk signature genes as a new variable
# to the AnnData object
adata2.obs['bulk_de_gene_average'] = cell_averages

# Plot the UMAP and color the cells based on the average expression of the bulk
# signature genes
sc.pl.umap(adata2, color='bulk_de_gene_average', cmap='viridis')

This produces the UMAP but I am not sure if it is correct. Thank you for the help

Edit: It seems that the second way is correct

single-cell scanpy RNA-seq UMAP • 2.6k views
ADD COMMENT
1
Entering edit mode
22 months ago
bioinfo ▴ 150

It seems that the second attempt I mentioned in the post is correct.

ADD COMMENT
0
Entering edit mode
22 months ago
Mensur Dlakic ★ 28k

I will answer your question in general. If you have three columns of data, where two of them are X and Y coordinates, and the third is some quantity to be used for coloring, it is absolutely trivial to do what you are asking. The only requirement is that rows of X,Y are correctly aligned with the values in third column.

In python with matplotlib, a generic command is:

matplotlib.pyplot.scatter(X, Y, cmap='rainbow', c=labels, s=2)

where columns X,Y contain coordinates, and the column labels contains values that are used for coloring. cmap points to the coloring scheme to be used.

ADD COMMENT
0
Entering edit mode

Thank you. I updated the question because I am trying to find a way to do this in scanpy and I find manipulating the object a bit confusing.

ADD REPLY

Login before adding your answer.

Traffic: 2109 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6