how to make box plot of specific row with group of sample in R
1
0
Entering edit mode
3.2 years ago
Peter • 0

I am a new baby in R, I would like to ask for help to make the boxplot with the group I have 2 files, file 1 is the value of the samples (gene expression) test.txt

data

gene group1.1 group1.2 group2.1 group2.2 a1 12 13 12 12 a2 2 3 25 31 a3 24 30 34 22 a4 10 11 23 24

and file 2 is the sample design design.txt

design

file condition group1.1 group1 group1.2 group1 group2.1 group2 group2.2 group2

I want to make the boxplot in R with one specific row for example: a1 and have 2 groups 1, and 2; the output looks like boxplot-a1

How can I do this, direct from 2 files? I think I do the stupid way

dt1 <- read.delim("test.txt", sep="\t", header = TRUE)
dg <- read.delim("design.txt", sep="\t", header = TRUE)

I make the new file by copy and transpose:

enter image description here

gene name group expression a1 Group1.1 group1 12 a1 Group1.2 group1 13 a1 Group2.1 group2 12 a1 Group2.2 group2 12.5 a2 Group1.1 group1 2 a2 Group1.2 group1 3 a2 Group2.1 group2 25 a2 Group2.2 group2 31 ...

dt <- read.delim("test_t.csv", sep="\t", header = TRUE)

a1 <- dt[dt$gene %in% "a1",]
ggplot(a1, aes(x=a1$group, y=a1$expression)) + 
    labs(title = "Expression A1", x = "Group", y = "Expression") +
    stat_boxplot(geom = "errorbar", width = 0.15) + 
    geom_boxplot().

and one more, if I have hundreds of genes (a1, a2, a3,a4...an), how can I use for loop to make all individual boxplots? Thank you so much for your help!

boxplot expression group R • 2.1k views
ADD COMMENT
3
Entering edit mode
3.2 years ago

Here is some example data.

# Counts table.
counts <- as.data.frame(replicate(4, sample(1:100, 6)))
colnames(counts) <- sprintf("group%s", apply(expand.grid(1:2, 1:2), 1, paste, collapse="."))
counts$gene <- sprintf("ENSG%08d", sample(1e2:1e5, 6))

> counts
  group1.1 group2.1 group1.2 group2.2         gene
1       38       32       62       72 ENSG00078354
2       14        8        5       67 ENSG00070937
3       67       10       63       48 ENSG00092073
4       96       76       31       31 ENSG00042154
5       85       64       35       12 ENSG00089990
6       99       94       80       54 ENSG00064588

# Design table.
design <- data.frame(file=colnames(counts)[1:ncol(counts)-1])
design$condition <- sub("\\.\\d+", "", design$file)

> design
      file condition
1 group1.1    group1
2 group2.1    group2
3 group1.2    group1
4 group2.2    group2

Pivot the counts to long format and then join the counts and design tables.

library("tidyverse")

df <- counts %>%
  pivot_longer(!gene, names_to="file", values_to="expression") %>%
  left_join(design, by="file")

> df
# A tibble: 24 x 4
   gene         file     expression condition
   <chr>        <chr>         <int> <chr>    
 1 ENSG00078354 group1.1         38 group1   
 2 ENSG00078354 group2.1         32 group2   
 3 ENSG00078354 group1.2         62 group1   
 4 ENSG00078354 group2.2         72 group2   
 5 ENSG00070937 group1.1         14 group1   
 6 ENSG00070937 group2.1          8 group2   
 7 ENSG00070937 group1.2          5 group1   
 8 ENSG00070937 group2.2         67 group2   
 9 ENSG00092073 group1.1         67 group1   
10 ENSG00092073 group2.1         10 group2   
# … with 14 more rows

You can now plot whichever gene you want.

gene <- "ENSG00078354"

df %>%
  dplyr::filter(gene == gene) %>%
  ggplot(aes(x=condition, y=expression)) +
    geom_boxplot() +
    ggtitle(gene)

enter image description here

ADD COMMENT
0
Entering edit mode

It's so great. Thank you so much for your help rpolicastro

ADD REPLY

Login before adding your answer.

Traffic: 1293 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6