Plotting a line graph using the ggplot2 for multiple columns (genes)
1
0
Entering edit mode
5.0 years ago

Hi,

I am using the ggplot2 library for plotting the line graph using the data frame as given below. The rows contains different timepoints (1, 2, 3, 4, 5, 6) and the columns contain values under the headers (Gene A, Gene B, Gene C, Gene D). Using the below code, I could generate 4 different plots each one for a particular gene. Please let me know if there is way to include all 4 genes in one plot instead of individual one's. Screen of the individual plot also attached for reference.

B1_v1_sorted

Timepoints  Gene A  Gene B  Gene C  Gene D
1   -0.757847   -0.404452   2.46365 -1.174
2   -0.0461316  -0.276019   2.87773 -1.01407
3   -1.35582    -0.47392    2.31436 -3.21155
4   -1.11589    -0.653023   1.27958 -1.32141
5   -1.63586    -1.64114    0.786856    -1.24327
6   -2.29268    -2.04769    -0.826819   -4.12988

library(ggplot2)
library(hrbrthemes)

p_v1 <- lapply(
  colnames(B1_v1_sorted)[2:ncol(B1_v1_sorted)],
  function(col) ggplot(B1_v1_sorted, aes_string(x = 'Timepoints', y = col)) +
    geom_line())

p_v1


require(cowplot)
plot_grid(
  p_v1[[1]], p_v1[[2]], p_v1[[3]], p_v1[[4]],
  ncol = 4,
  labels = colnames(B1_v1_sorted)[2:5])

Thank you,

Toufiq

ggplot2 line graph R • 8.3k views
ADD COMMENT
1
Entering edit mode

Explained in this post. Kindly refer.

ADD REPLY
0
Entering edit mode

@Chirag Parsania, thank you for the suggestions. I could get the expected plots.

I have two questions,

  1. This is plotted just with the few IDs (last column), I have more than hundred IDs and saving the pdf looks fuzzy. Is there a way to include all the plots corresponding to the ID by setting margins in the pdf report.

  2. Include the gene names inside each ID box, as separate legend occupies more space and not very interpretive in case of large data.

dput(head(final))

structure(list(Genes = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("Gene_A", "Gene_B", "Gene_C", "Gene_D", "Gene_E", "Gene_F", "Gene_G", "Gene_H", "Gene_I", "Gene_K", "Gene_L", "Gene_M", "Gene_N", "Gene_O", "Gene_P", "Gene_R", "Gene_S", "Gene_T"), class = "factor"), Timepoints = c("1", "2", "3", "5", "1", "2"), value = c("-2.05066", "-0.657222", "-1.49477", "-1.80191", "-8.35787", "-9.52402"), X5 = structure(c(Gene_A = 1L, Gene_A = 1L, Gene_A = 1L, Gene_A = 1L, Gene_B = 2L, Gene_B = 2L ), .Label = c("A1.1", "A1.2", "A1.3", "A1.4", "A1.5", "A1.6", "A1.9", "A2.2", "A2.6"), class = "factor")), row.names = c(NA, 6L), class = "data.frame")

final %>% ggplot(aes(x = Timepoints , y = value , group = Genes)) +
geom_point() +
geom_line(alpha = 1 , aes(col = as.character(Genes))) + theme_bw() +
theme(legend.position = "right" , axis.text.x = element_text(angle = 90 , vjust = 0.4)) + facet_wrap(~X5)

ADD REPLY
1
Entering edit mode

In the plot above you showed, you should make value column to numeric rather character. It will make your y-axis visually look much better than current. Also, if you have more than 100 genes to show each with name of each in legend, probably line plot is not a good idea. Better you use heatmap.

ADD REPLY
0
Entering edit mode

@Chirag Parsania,

Thank you for the observation. Yes, the y-axis now looks better.

ADD REPLY
1
Entering edit mode
5.0 years ago

you should format your input table in a long format (compared to your wide format) using dplyr's gather function :

B1_V1_sorted_long <- 
   B1_v1_sorted %>% 
   gather(c("Gene A","Gene B","Gene C","Gene D"),key="Gene",value="Value")

Then ggplot with group aes

B1_V1_sorted_long %>% 
 ggplot(aes(x= Timepoints,y=Value,col=Gene,group=Gene)) + 
 geom_line()

*FYI code not tested

ADD COMMENT
0
Entering edit mode

@ Nicolas Rosewick , thank you very much, this worked.

I have another question, I have 300 genes and grouped based on a particular ID as given below. How do I gather 300 genes using the dplyr library as in the previous scenario there were only 4 genes which was easy to input. Also, In addition, how do I extract only the genes belonging to the particular ID (ID column) and plot the same. For instance, X-axis are the Timepoints and Y-axis is the value, which match only to the ID A1.2, then A1.6 and so on. Is it possible to loop the same? Thank you.

Timepoints  Gene    Value   ID
1   Gene A  -0.404452   A1.2
2   Gene A  -0.276019   A1.2
1   Gene B  -0.47392    A1.2
2   Gene B  -0.653023   A1.2
1   Gene C  -1.64114    A1.2
2   Gene C  -2.04769    A1.2
1   Gene D  -0.865624   A1.2
2   Gene D  -2.16485    A1.2
1   Gene AA -0.0588112  A1.6
2   Gene AA -3.75268    A1.6
1   Gene BB -3.75268    A1.6
2   Gene BB -4.61661    A1.6
1   Gene CC -2.39765    A1.6
2   Gene CC -4.46908    A1.6
1   Gene DD -3.97157    A1.6
2   Gene DD -3.75419    A1.6
1   -   -   A1.8
2   -   -   A1.8
1   Gene n  -   A2.0
2   Gene n  -   A2.0
ADD REPLY

Login before adding your answer.

Traffic: 2295 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6