Hi there, been using Roary for 3 years now... awesome tool!
I'm just curious how it is that all the plots it produces aren't single values? For example... the "Number of unique genes" plot should just be a fixed single value of the number of unique genes am i right? So how come it's a collection of whisker plots per genome, that sometimes even has outlier points? How can it have a range of values per genome unless it calculates it for different Nucleotide Identity values or something... It's not as if it also produced a range of gene_presence_absence.csv files.
I'm just curious if anyone knows what these plot ranges are based on
EDIT: i looked all over, but there is no explenation anywhere where the ranges come from. So ended up writing an extensive R script that just calculates it for you based on the gene_abscence_presence.csv file... but would still be interested in knowing the reason for this weird output if anyone knows
I've used roary a fair bit. None of the plots seemed unusual to me, but I'm struggling to picture them now. Can you show an example of the plot you mean specifically?
For example, this is the default plot you get for New genes per genome... it's a whisker plot, meaning that each genome has a "range" of new genes... which is off course totally absurd unless there is some sort of "threshold" through which Roary analyzes these new genes (for example on a range of different Identity scores)
But the troubling one for me was this one, the "unique" genes per genome plot... Unique is singular... so i'm really confused where this range comes from
And when i convert the gene_abscence_prescence.csv to binary matrix, score the number of rows that have single entries. then count the number of rows per genome that belong to a orthologous group with only 1 entry... my plot looks nothing like this... so even the values it should represent based on the absence presence matrix, are not in these plots :S