How to plot proportion of cells in each cluster with scanpy?
2
0
Entering edit mode
15 months ago
bioinfo ▴ 150

Hello

I am analyzing single cell data with scanpy. I have using leiden to cluster my samples. I would like to figure out how many cells are in each cluster and plot the proportion of cells for each cluster. I have crossposted the question on stackoverflow but I have not gotten an answer so I am trying here too (https://stackoverflow.com/questions/77160135/how-to-plot-proportion-of-cells-in-each-cluster-with-scanpy)

I have found the code shown below from this link https://nbisweden.github.io/workshop-scRNAseq/labs/compiled/scanpy/scanpy_04_clustering.html

tmp = pd.crosstab(adata.obs['leiden_0.6'],adata.obs['type'], normalize='index')
tmp.plot.bar(stacked=True).legend(loc='upper right')

However, I am not sure how to adjust it for my data because I don't have 2 groups. I just want a graph that shows that cluster 1 is 10% of the total cells, cluster 2 is 20% etc.

Thank you

scRNA-seq scanpy single-cell • 3.5k views
ADD COMMENT
1
Entering edit mode
15 months ago
Radu Tanasa ▴ 140

Hi. If I get this right, you simply need to compute the percentage of cells in each cluster at the dataset level?

import pandas as pd
import seaborn as sns
import matplotlib.plt as plt

data={}
for v in adata.obs['leiden_1_0'].unique():
    data[v]=adata[adata.obs['leiden_1_0']==v].shape[0]/adata.shape[0]*100
df = pd.DataFrame.from_dict(data,orient='index',columns=['percentage'])    
df['cluster']=df.index
df=df.reset_index(drop=True)
sns.barplot(data=df, x='cluster', y='percentage')
plt.show()

You can then save your df as a CSV file if you want with df.to_csv('path')

ADD COMMENT
0
Entering edit mode

Thank you so much. That worked perfectly and it was much faster than what I was trying.

ADD REPLY
0
Entering edit mode
15 months ago
bk11 ★ 3.0k

Please try this-

tmp = pd.crosstab(adata.obs['leiden_0.6'],adata.obs['type'], normalize='columns').T.plot(kind='bar', stacked=True)
tmp.legend(title='leiden_0.6', bbox_to_anchor=(1.26, 1.02),loc='upper right')

enter image description here

ADD COMMENT
0
Entering edit mode

Thank you for the suggestion. Unfortunately, that does not work for me because I do not have a "type" argument in the adata.obs. I think that at the end I would need to have one column on the chart with the different percentages for each cluster for leiden_0.6. It would be nice if I could also get the amount of cells per cluster printed on a separate file.

ADD REPLY
0
Entering edit mode

Why you had type in your code above then?

ADD REPLY
0
Entering edit mode

Because that was the code from the link. It is how they specified they had 2 groups of samples but I don't have 2 groups. Sorry for the confusion.

ADD REPLY
0
Entering edit mode

The following code will write percentage in your stacked barplot.

cross_tab = pd.crosstab(adata.obs['leiden_0.6'],adata.obs['type'], normalize='columns')*100
    ax = cross_tab.plot(kind='bar', stacked=True, figsize=(8, 6))
    ax.legend(title="leiden_0.6", bbox_to_anchor=(1.18, 1.02), loc="upper right")
    # Add labels to the bars
    for p in ax.patches:
        width, height = p.get_width(), p.get_height()
        x, y = p.get_xy() 
        ax.annotate(f'{height:.1f}%', (x + width/2, y + height/2), ha='center', va='center')
        # Set labels and title
    plt.xlabel('Category')
    plt.ylabel('Percentage')
    plt.title('Stacked Bar Plot with Percentage Labels')
ADD REPLY
0
Entering edit mode

Thank you for replying again. The "type" still causes issues but the reply by Radu Tanasa worked.

ADD REPLY

Login before adding your answer.

Traffic: 1573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6