how to make box plot of specific row with group of sample in R
Entering edit mode
3.4 years ago
Peter • 0

I am a new baby in R, I would like to ask for help to make the boxplot with the group I have 2 files, file 1 is the value of the samples (gene expression) test.txt


gene group1.1 group1.2 group2.1 group2.2 a1 12 13 12 12 a2 2 3 25 31 a3 24 30 34 22 a4 10 11 23 24

and file 2 is the sample design design.txt


file condition group1.1 group1 group1.2 group1 group2.1 group2 group2.2 group2

I want to make the boxplot in R with one specific row for example: a1 and have 2 groups 1, and 2; the output looks like boxplot-a1

How can I do this, direct from 2 files? I think I do the stupid way

dt1 <- read.delim("test.txt", sep="\t", header = TRUE)
dg <- read.delim("design.txt", sep="\t", header = TRUE)

I make the new file by copy and transpose:

enter image description here

gene name group expression a1 Group1.1 group1 12 a1 Group1.2 group1 13 a1 Group2.1 group2 12 a1 Group2.2 group2 12.5 a2 Group1.1 group1 2 a2 Group1.2 group1 3 a2 Group2.1 group2 25 a2 Group2.2 group2 31 ...

dt <- read.delim("test_t.csv", sep="\t", header = TRUE)

a1 <- dt[dt$gene %in% "a1",]
ggplot(a1, aes(x=a1$group, y=a1$expression)) + 
    labs(title = "Expression A1", x = "Group", y = "Expression") +
    stat_boxplot(geom = "errorbar", width = 0.15) + 

and one more, if I have hundreds of genes (a1, a2, a3,, how can I use for loop to make all individual boxplots? Thank you so much for your help!

boxplot expression group R • 2.2k views
Entering edit mode
3.4 years ago

Here is some example data.

# Counts table.
counts <-, sample(1:100, 6)))
colnames(counts) <- sprintf("group%s", apply(expand.grid(1:2, 1:2), 1, paste, collapse="."))
counts$gene <- sprintf("ENSG%08d", sample(1e2:1e5, 6))

> counts
  group1.1 group2.1 group1.2 group2.2         gene
1       38       32       62       72 ENSG00078354
2       14        8        5       67 ENSG00070937
3       67       10       63       48 ENSG00092073
4       96       76       31       31 ENSG00042154
5       85       64       35       12 ENSG00089990
6       99       94       80       54 ENSG00064588

# Design table.
design <- data.frame(file=colnames(counts)[1:ncol(counts)-1])
design$condition <- sub("\\.\\d+", "", design$file)

> design
      file condition
1 group1.1    group1
2 group2.1    group2
3 group1.2    group1
4 group2.2    group2

Pivot the counts to long format and then join the counts and design tables.


df <- counts %>%
  pivot_longer(!gene, names_to="file", values_to="expression") %>%
  left_join(design, by="file")

> df
# A tibble: 24 x 4
   gene         file     expression condition
   <chr>        <chr>         <int> <chr>    
 1 ENSG00078354 group1.1         38 group1   
 2 ENSG00078354 group2.1         32 group2   
 3 ENSG00078354 group1.2         62 group1   
 4 ENSG00078354 group2.2         72 group2   
 5 ENSG00070937 group1.1         14 group1   
 6 ENSG00070937 group2.1          8 group2   
 7 ENSG00070937 group1.2          5 group1   
 8 ENSG00070937 group2.2         67 group2   
 9 ENSG00092073 group1.1         67 group1   
10 ENSG00092073 group2.1         10 group2   
# … with 14 more rows

You can now plot whichever gene you want.

gene <- "ENSG00078354"

df %>%
  dplyr::filter(gene == gene) %>%
  ggplot(aes(x=condition, y=expression)) +
    geom_boxplot() +

enter image description here

Entering edit mode

It's so great. Thank you so much for your help rpolicastro


Login before adding your answer.

Traffic: 2063 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6