Hi all,
I am trying to plot a violin plot of gene expression in different categories. Since that the range of expression is quite wide, I did log transformation of the expression level. While I plot the violin plot using ggplot2, I use the following code:
library(ggplot2)
data <- read.table("PATH", sep = "\t", header = FALSE)
p <- ggplot(data, aes(x=V14, y=V13, fill=V14)) + geom_violin(trim=FALSE)
p + scale_y_log10(breaks =c(0.00000001,0.0000001,0.000001,0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000,1000000),labels = scales::trans_format("log10", scales::math_format(10^.x))) + geom_boxplot(width=0.1) + scale_x_discrete(limits=c("A","B","R","Q")) + scale_fill_manual(values=c("#99CCFF", "#E0E0E0", "#FFFF99","#FFFFFF"))
The column 13 is the expression level, while column 14 is the category of the gene. During the plotting I got message:
Removed 229 rows containing non-finite values
It is because that there are some gene with 0 expression so that it would be a non-infinite value after log transformation. But without these 0 expression values, the violin plot of 4 categories seems quite similar. However, the category R should have a lower expression level compared to other categories.
Is there any way to better display this data, while not neglect the non-infinite values? Basically, it would be ideal that reader can inform from the violin plot that:"R category, which containing some non-infinite values, has the lowest expression level compared to other categories."
Thanks a lot!!!
Wouldn't it be an idea to just add a small scalar to your data before doing the log-transform (which would be the typical approach to avoid infinite values when logging expression)?