Question

Plot to show expression of a gene between two groups

1

Entering edit mode

6.3 years ago

Biologist ▴ 290

Hi,

I have RNA-seq counts data. Differential analysis was done between groups A and B and differential genes were selected based FC > 2 and FDR < 0.05.

To show the expression of a specific differentially expressed gene in a plot between group A and B, I converted the counts to logCPM expression and made a violin plot with box plot in it.

    Samples  Type   GeneA
    Sample1    B    14.82995162
    Sample2    B    12.90512275
    Sample3    B    9.196524783
    Sample4    A    19.42866012
    Sample5    A    19.70386922
    Sample6    A    16.22906914
    Sample7    A    12.48966785
    Sample8    B    15.53280377
    Sample9    A    9.345795955
    Sample10    B   9.196524783
    Sample11    B   9.196524783
    Sample12    B   9.196524783
    Sample13    A   9.434355615
    Sample14    A   15.27604692
    Sample15    A   18.90867329
    Sample16    B   11.71503095
    Sample17    B   13.7632545
    Sample18    A   9.793864295
    Sample19    B   9.196524783
    Sample20    A   14.52562066
    Sample21    A   13.85116605
    Sample22    A   9.958492229
    Sample23    A   17.57075876
    Sample24    B   13.04499079
    Sample25    B   15.33577937
    Sample26    A   13.95849295
    Sample27    B   9.196524783
    Sample28    A   18.20524388
    Sample29    B   17.7058873
    Sample30    B   14.0199393
    Sample31    A   16.21499069
    Sample32    A   14.171432
    Sample33    B   9.196524783
    Sample34    B   9.196524783
    Sample35    B   15.16648035
    Sample36    B   12.9435081
    Sample37    B   13.81971106
    Sample38    B   15.82901231

I tried making a violin plot using ggviolin.

library("ggpubr")
pdf("eg.pdf", width = 5, height = 5)
p <- ggviolin(eg, x = "Type", y = "GeneA", fill = "Type",
          palette = c("#00AFBB", "#FC4E07"),
          add="boxplot",add.params = list(fill="white"),
          order = c("A", "B"),
          ylab = "GeneA (logCPM)", xlab = "Groups")
ggpar(p, ylim = c(0,30))
dev.off()

The plot looks like this violinplot. I have few doubts.

Why there is no lower quartile for the group B in the plot.

Is there a way to show the dots on the violin plot? and also highlighting the outliers [for example I'm interested in highlighting the sample10]

thanq

RNA-Seq r gene expression plot • 4.4k views

ADD COMMENT • link updated 6.3 years ago by EagleEye 7.6k • written 6.3 years ago by Biologist ▴ 290

1

Entering edit mode

Black line in the box plot is median. In A data, you have more points above median and in B data, it is the other way round, you have more values below the median. In R, you can do a small exercise to validate this: Calculate the median, count points/entities above and below median and their percentage. Do this for both A and B, you should be getting an idea about the distribution (which is already evident from Boxplot). However, there is another trend in B data, which is bimodel (IMO). Try beeswarm plot along with violin plot instead of jitter.

You can also try ggjoy (for joyplots) for viewing bimodal trend in B data.

ADD REPLY • link 6.3 years ago by cpad0112 21k

score 2 · Answer 1 · 2018-10-04

2

Entering edit mode

6.3 years ago

EagleEye 7.6k

Simple way to plot is,

boxplot(eg$GeneA~eg$Type,ylab="GeneA (logCPM)", outcol="white")

stripchart(eg$GeneA~eg$Type, vertical = TRUE, data = eg[,c(2:3)], method = "jitter", add = TRUE, pch = 20, col = 'blue')

ADD COMMENT • link 6.3 years ago by EagleEye 7.6k

0

Entering edit mode

Yes, with jitter this is possible thank you. But could you please tell why the lower whisker missing for group B? and how to highlight the specific sample point in the box plot?

ADD REPLY • link 6.3 years ago by Biologist ▴ 290

0

Entering edit mode

Refer to boxplot.stats function. Biologist . Answer to your question is below:

> boxplot.stats(eg[eg$Type=="B",]$GeneA)$stats
[1]  9.196525  9.196525 12.943508 14.829952 17.705887

ADD REPLY • link 6.3 years ago by cpad0112 21k