Question

Non-infinite value in violin plot

0

Entering edit mode

8.7 years ago

izzy.yichao.cai ▴ 180

Hi all,

I am trying to plot a violin plot of gene expression in different categories. Since that the range of expression is quite wide, I did log transformation of the expression level. While I plot the violin plot using ggplot2, I use the following code:

library(ggplot2)
data <- read.table("PATH", sep = "\t", header = FALSE)
p <- ggplot(data, aes(x=V14, y=V13, fill=V14)) + geom_violin(trim=FALSE)
p + scale_y_log10(breaks =c(0.00000001,0.0000001,0.000001,0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000,1000000),labels = scales::trans_format("log10", scales::math_format(10^.x))) + geom_boxplot(width=0.1) + scale_x_discrete(limits=c("A","B","R","Q")) + scale_fill_manual(values=c("#99CCFF", "#E0E0E0", "#FFFF99","#FFFFFF"))

The column 13 is the expression level, while column 14 is the category of the gene. During the plotting I got message:

Removed 229 rows containing non-finite values

It is because that there are some gene with 0 expression so that it would be a non-infinite value after log transformation. But without these 0 expression values, the violin plot of 4 categories seems quite similar. However, the category R should have a lower expression level compared to other categories.

Is there any way to better display this data, while not neglect the non-infinite values? Basically, it would be ideal that reader can inform from the violin plot that:"R category, which containing some non-infinite values, has the lowest expression level compared to other categories."

Thanks a lot!!!

R • 2.9k views

ADD COMMENT • link 8.7 years ago by izzy.yichao.cai ▴ 180

3

Entering edit mode

Wouldn't it be an idea to just add a small scalar to your data before doing the log-transform (which would be the typical approach to avoid infinite values when logging expression)?

ADD REPLY • link 8.7 years ago by Marge ▴ 320