Question

Pangenome visualization: flowerplot?

0

Entering edit mode

10.2 years ago

Christian ▴ 30

I have partitioned the gene complement of 21 strains of a particular species of Strep into core, dispensable, and unique sets, but I'm at a loss as how best to represent these data. I originally thought Venn diagram illustrating the total sizes of each partition, but wasn't satisfied that I was able to accurately portray my data in this fashion. I did some searching and found a very interesting method to capture a more appropriate representation (Fig. A, B and C), something the authors called a "flowerplot" ¹ (a new one on me). I've been trying to re-create this sort of visualization manually using matplotlib in Python 3.3 as there doesn't seem to be any package that exists to provide comparable output in a more automatic way, but haven't had much success.

What I've tried:

Adding Ellipse patches to Cartesian axes. Not satisfactory because the ellipse patches' xy argument centers the ellipse at (x,y), where I'd need some way to rotate the ellipse about the origin to achieve the desired effect.
Adding Ellipse patches to polar axes. This was a complete mess; I can get one good ellipse but can't place any others reliably (most likely due to my lack of understanding of using polar coordinates!).

Additionally, using Ellipse patches might not end up being the best move, since I'll need to annotate each ellipse with, at the very least, strain ID and count information.

Is anyone familiar with a way to either effectively visualize these data, or perhaps duplicate the plots I linked?

¹ Sugawara, et al. (2013). Comparative genomics of the core and accessory genomes of 48 Sinorhizobium strains comprising five genospecies. Genome Biology 2013, 14:R17. doi:10.1186/gb-2013-14-2-r17 or http://genomebiology.com/2013/14/2/R17.

genome visualization • 3.2k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Christian ▴ 30

0

Entering edit mode

The flowerplots you link to would be better represented as a simple table with a "sum" or "total" column for the genus/species. Then you could sort them in a meaningful order.

ADD REPLY • link 10.2 years ago by Ryan Dale 5.0k

Ram · Accepted Answer · 2014-09-16

1

Entering edit mode

10.2 years ago

Ryan Dale 5.0k

One option might be "binary heatmaps". I find them useful for visualization of combinatorial ChIP-seq binding, for example, Figure 6E here.

Using the row/column format of that figure, in your case you would have a row for each strain and a column for each gene. Instead of just a black/white 1/0 as in that example, you could encode the gene type as 1/2/3 for core/dispensable/unique. So you'd have at least 3 colors in the heatmap. The trick would then be to play around with different sort orders or clustering to get some meaningful interpretation.

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Ryan Dale 5.0k

0

Entering edit mode

I was thinking about something along these lines, but was so enamored of the figure I referenced I couldn't move on without consulting the world at large ;)

ADD REPLY • link 10.2 years ago by Christian ▴ 30