Hi,
I have RNA-seq counts data. Differential analysis was done between groups A and B and differential genes were selected based FC > 2 and FDR < 0.05.
To show the expression of a specific differentially expressed gene in a plot between group A and B, I converted the counts to logCPM expression and made a violin plot with box plot in it.
Samples Type GeneA
Sample1 B 14.82995162
Sample2 B 12.90512275
Sample3 B 9.196524783
Sample4 A 19.42866012
Sample5 A 19.70386922
Sample6 A 16.22906914
Sample7 A 12.48966785
Sample8 B 15.53280377
Sample9 A 9.345795955
Sample10 B 9.196524783
Sample11 B 9.196524783
Sample12 B 9.196524783
Sample13 A 9.434355615
Sample14 A 15.27604692
Sample15 A 18.90867329
Sample16 B 11.71503095
Sample17 B 13.7632545
Sample18 A 9.793864295
Sample19 B 9.196524783
Sample20 A 14.52562066
Sample21 A 13.85116605
Sample22 A 9.958492229
Sample23 A 17.57075876
Sample24 B 13.04499079
Sample25 B 15.33577937
Sample26 A 13.95849295
Sample27 B 9.196524783
Sample28 A 18.20524388
Sample29 B 17.7058873
Sample30 B 14.0199393
Sample31 A 16.21499069
Sample32 A 14.171432
Sample33 B 9.196524783
Sample34 B 9.196524783
Sample35 B 15.16648035
Sample36 B 12.9435081
Sample37 B 13.81971106
Sample38 B 15.82901231
I tried making a violin plot using ggviolin
.
library("ggpubr")
pdf("eg.pdf", width = 5, height = 5)
p <- ggviolin(eg, x = "Type", y = "GeneA", fill = "Type",
palette = c("#00AFBB", "#FC4E07"),
add="boxplot",add.params = list(fill="white"),
order = c("A", "B"),
ylab = "GeneA (logCPM)", xlab = "Groups")
ggpar(p, ylim = c(0,30))
dev.off()
The plot looks like this violinplot. I have few doubts.
Why there is no lower quartile for the group B in the plot.
Is there a way to show the dots on the violin plot? and also highlighting the outliers [for example I'm interested in highlighting the sample10]
thanq
Black line in the box plot is median. In A data, you have more points above median and in B data, it is the other way round, you have more values below the median. In R, you can do a small exercise to validate this: Calculate the median, count points/entities above and below median and their percentage. Do this for both A and B, you should be getting an idea about the distribution (which is already evident from Boxplot). However, there is another trend in B data, which is bimodel (IMO). Try beeswarm plot along with violin plot instead of jitter.
You can also try ggjoy (for joyplots) for viewing bimodal trend in B data.