Question

Alternative Graphical Representation To Venn Diagrams

4

Entering edit mode

12.5 years ago

Julien Textoris ▴ 430

Dear all,

This is a question that I can't answer for long. When doing functionnal annotation, how to represent the results ? You have a list of genes, or protein. You use DAVID functionnal annotation tool for example to search for enriched functions. Then you want to represent the functions in your list, but as keywords are shared by many genes, you cannot do a simple pie-chart.

An alternative would be to use venn diagrams, where each keyword is represented by an ensemble, each size is proportional to the number of genes annotated with one keyword, and the joint surface proportional to the number of genes that share two keywords. However, as you usually want to display many keywords, the representation is difficult to impossible to obtain.

Do you know an alternative to show together these three items :

1) The selected keywords, that may highlights the relevant functions in your list 2) The number of the genes that are annotated with a given keyword 3) The proportion of genes that share 2 or more keywords

Thanks for an hint on this,

Julien

annotation • 14k views

ADD COMMENT • link updated 8.0 years ago by vladimir.yu.kiselev ▴ 30 • written 12.5 years ago by Julien Textoris ▴ 430

0

Entering edit mode

I like the idea of heatmap by JC, but again, what's wrong with a pie chart in this scenario?

ADD REPLY • link 12.5 years ago by Arun 2.4k

0

Entering edit mode

I don't know if it is "wrong", but I tested it several times, when people see a pie-chart showing the annotation, they believe the whole is the total number of genes. As genes are annotated with many keywords, i don't really know what the areas mean. If we allow that the genes are repeated for the representation (because they fall in many different categories), then the area for keyword A will be smaller than expected (if half of the list of genes share keyword A, but that most of them are annotated with at least 6 keywords, the area for A will not be the half of the circle, but much smaller ?). Don't know if I really clear !

ADD REPLY • link 12.5 years ago by Julien Textoris ▴ 430

score 5 · Answer 1 · 2012-05-26

5

Entering edit mode

12.5 years ago

Woa ★ 2.9k

Can this be useful?

http://brainchronicle.blogspot.com/2012/05/another-look-at-over-representation.html#more

ADD COMMENT • link 12.5 years ago by Woa ★ 2.9k

0

Entering edit mode

Thanks ! that seems really interesting. I'll try this.

ADD REPLY • link 12.5 years ago by Julien Textoris ▴ 430

score 2 · Answer 2 · 2012-05-25

2

Entering edit mode

12.5 years ago

JC 13k

What about a heatmap showing genes and biological functions? Colors can show the p-value or other numerical value, groups can be represented with column/row separators.

ADD COMMENT • link 12.5 years ago by JC 13k

score 2 · Answer 3 · 2016-12-05

2

Entering edit mode

8.0 years ago

vladimir.yu.kiselev ▴ 30

http://www.caleydo.org/tools/upset/

ADD COMMENT • link 8.0 years ago by vladimir.yu.kiselev ▴ 30

score 0 · Answer 4 · 2012-05-26

Hi,

I post that as an answer because comment is to small, but I don't know how to implement it, so it is not really an answer ...

Imagine a representation similar to a venn diagram, where :

each keyword is represented with a circle,
the surface area of which is proportional to the number of genes with this keyword,
but instead of defining the share number with a joint surface, we define a distance between the circles that is inversely propotional to the sharing ?

The problem is that i can't figure out if this distance respect the principle AC <= AB + BC, and as a percentage can be defined by "(A n B) / A" or "(A n B) / B", how to choose the distance to use ?

Example : Four keywords A, B, C, D :

kwd   |   # genes
-----------------------
 A    |   320
 B    |   187
 C    |   88
 D    |   37

And let's define that A & B share 18 genes, A & C 34, B & C 60, B & D 20, C & D 10 and A & D 5

I choose to order the keyword by decreasing number of genes (don't know really why, but I feel it may help ?). In the matrix below, I define the percentage of shared genes :

for the upper/right part, I divide by the column header
for the bottom/left part, I divide by the row header

It gives you the following matrix

      A  |   B  |   C  |  D

A  |  1  | 0.09 | 0.38 | 0.14

B | 0.05 |   1  | 0.68 | 0.54

C | 0.10 | 0.32 |   1  | 0.27

D | 0.02 | 0.11 | 0.11 |  1

We can take these numbers or "1-these", the problem is that I don't know which part of the matrix I have to choose ?

If someone have an idea ?

Thanks again

Julien