Hey guys, I have a table with circa of 1600 rows, like this:
Element-ID,Protein-ID,Protein-Product,Virus-ID,Super-Kingdom,Order,Family,Genus,Species,Sense,Start,End,EVE-length ctg_1003:180657-180956,YP_184770.1,hypothetical protein,39640,Viruses,Unclass,Polydnaviridae,Bracovirus,Unclass,neg,180657,180956,299 ctg_1003:283549-284079,YP_184879.1,hypothetical protein,39640,Viruses,Unclass,Polydnaviridae,Bracovirus,Unclass,pos,283549,284079,530 ctg_1007:58711-59043,YP_184770.1,hypothetical protein,39640,Viruses,Unclass,Polydnaviridae,Bracovirus,Unclass,neg,58711,59043,332 ctg_100:908810-909199,YP_184882.1,hypothetical protein,39640,Viruses,Unclass,Polydnaviridae,Bracovirus,Unclass,neg,908810,909199,389 ctg_1011:242875-243240,YP_001426207.1,hypothetical protein,399781,Viruses,Algavirales,Phycodnaviridae,Chlorovirus,Paramecium,bursaria Chlorella virus A1,pos,242875,243240,365
Basically it's a csv file with taxonomy information from endogenous viruses. So I read this csv file as a pandas dataframe (I'm working with python 3.7).
df = pd.read_csv('eve_tax.csv', sep=',',header = 0)
An create a swarmplot with seaborn:
beeswarn_plot =sns.swarmplot(x='EVE-length',y='Super-Kingdom', hue='Family', data=df)
So, my code, at the moment, is:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.style as style
sns.set(style="ticks")
style.use('seaborn-poster')
df = pd.read_csv('eve_tax.csv', sep=',',header = 0)
beeswarn_plot =sns.swarmplot(x='EVE-length',y='Super-Kingdom', hue='Family', data=df)
sns.despine(fig=None, top=True, right=True, left=True, bottom=False, offset=None, trim=False)
beeswarn_plot.set_ylabel('')
beeswarn_plot.set_yticks([])
beeswarn_plot.set_xlabel('EVE length (pb)')
beeswarn_plot.legend(loc='upper center', bbox_to_anchor=(0.7, 1.05), ncol=4, fancybox=True, prop={'size': 11},title='EVEs Family')
plt.savefig('eves_tax_beeswarn_plot.pdf',dpi=300)
Which plots a 'beeswarm' plot like this:
So, what I want?
For some families I have a low number of elements, such as: Orthomyxoviridae n = 5 , Poxviridae n= 5. I want to put all families with 5 or less elements in a category 'Others', and plot with seaborn all families with 5 or more elements and the category 'Others'.
Can anyone help? Thanks !