Question

How to adjust and align timepoints on x-axis in the ggplot2

0

Entering edit mode

5.2 years ago

mohammedtoufiq91 ▴ 260

Hi,

I am working on the line plot using ggplot2 library. I notice that the data point are not aligned correctly on the x-axis (different timepoints). Below is the code that I ran in R and image of the line plot. In the Image, as shown data point from T5 is aligned on T6 and T20 is not aligned as well. Please let me know how to fix the issue.

Note: Some data points are indeed missing in the middle as they are not present in the dataframe.

str(B1_Patient_Module_ID_sorted)
'data.frame':   5016 obs. of  4 variables:
 $ Genes     : Factor w/ 264 levels "ABHD5","ACOT4",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Timepoints: num  1 1 1 1 1 1 1 1 1 1 ...
 $ value     : num  -2.05 -8.36 -2.06 -3.84 -6.59 ...
 $ X20       : Factor w/ 66 levels "M10.1","M10.2",..: 53 59 53 53 44 6 29 12 29 19 ...
  ..- attr(*, "names")= chr  "ABHD5" "ACOT4" "ACTN4" "ACTR10" ...


pdf("B1_Module_v3.pdf", 7, 6)
for (i in seq(1, length(unique(B1_Patient_Module_ID_sorted$X20)), 1)) {
  print(ggplot(B1_Patient_Module_ID_sorted[B1_Patient_Module_ID_sorted$X20 %in% levels(B1_Patient_Module_ID_sorted$X20)[i:(i)], ], 
               aes(x =  Timepoints , y = value , group = Genes)) + 
          geom_point() + 
          geom_line(alpha = 1 , aes(col = Genes)) + 
          facet_wrap(~ X20) +
          scale_y_continuous(name = "-Delta Ct")+
          scale_x_discrete(name = "Timepoints", limits=c("1"= "T1", "2" = "T2",  "3" = "T3",  "5"= "T5", "6" = "T6", "7" = "T7", "8"= "T8", "9" = "T9",  "10" = "T10", "11"= "T11", "12" = "T12",  "13" = "T13", "14"= "T14", "15" = "T15", "16" = "T16", "17"= "T17", "18" = "T18", "19" = "T19", "20" = "T20"))+
          theme_classic()+
          theme(legend.position = "right") +
          theme(plot.title = element_text(lineheight=.8,size =14,face = "bold"),
                axis.text.x = element_text(colour="black",size=4.5,angle=0,hjust=0.5,vjust=0.5,face="plain"),
                axis.text.y = element_text(colour="black",size=4,angle=0,hjust=0,vjust=0.5,face="plain"),  
                axis.title.x = element_text(colour="black",size=15,angle=0,hjust=.5,vjust=0,face="plain"),
                axis.title.y = element_text(colour="black",size=14,angle=90,hjust=.5,vjust=.5,face="plain"),
                strip.background = element_blank(),
                legend.position = "right"))
}
dev.off()

Thank you,

Toufiq

ggplot2 r line plot x-axis data frame • 4.6k views

ADD COMMENT • link updated 5.2 years ago by zx8754 12k • written 5.2 years ago by mohammedtoufiq91 ▴ 260

1

Entering edit mode

Hello mohammedtoufiq91!

It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/126702/

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 5.2 years ago by ATpoint 87k

0

Entering edit mode

@ ATpoint,

Apologies for the confusion. This would not be repeated going forward.

ADD REPLY • link 5.2 years ago by mohammedtoufiq91 ▴ 260

0

Entering edit mode

Can you post a snippet of the data used to generate that graph? For such an issue a reproducible example is required to properly troubleshoot.

ADD REPLY • link 5.2 years ago by Mark ★ 1.6k

1

Entering edit mode

5.2 years ago

zx8754 12k

Please provide example data.

Just guessing try to change your timepoints into factors, something like:

B1_Patient_Module_ID_sorted$Timepoints <- factor(B1_Patient_Module_ID_sorted$Timepoints,
                                                 levels = 1:20,
                                                 labels = paste0("T", 1:20))

Then we do not need to define the scale_x_discrete(...).

ADD COMMENT • link 5.2 years ago by zx8754 12k

0

Entering edit mode

@ zx8754,

Excellent. This fixed the issue.

Another question, as shown in the figure timepoint T10, there is no data point which is because of the missing values for that particular timepoint, is there a way to display or represent it by dotted line as there are many such line plots or should it removed for better representation.. For instance, in the Complexheatmap or pheatmap, we say na_col = grey for missing values.

ADD REPLY • link 5.2 years ago by mohammedtoufiq91 ▴ 260

2

Entering edit mode

Yes, it is possible.

Provide example data

ADD REPLY • link 5.2 years ago by zx8754 12k

0

Entering edit mode

@ zx8754 ,

thank you. Here is the data.

dput(Test_v1_ID)
structure(list(Genes = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L), .Label = c("Gene_A", "Gene_B", "Gene_C", "Gene_D", "Gene_D.1", 
"Gene_E", "Gene_F", "Gene_G"), class = "factor"), Timepoints = c("1", 
"2", "8", "9", "10", "11", "12", "16", "17", "18", "19", "1", 
"2", "8", "9", "10", "11", "12", "16", "17", "18", "19", "1", 
"2", "8", "9", "10", "11", "12", "16", "17", "18", "19", "1", 
"2", "8", "9", "10", "11", "12", "16", "17", "18", "19", "1", 
"2", "8", "9", "10", "11", "12", "16", "17", "18", "19", "1", 
"2", "8", "9", "10", "11", "12", "16", "17", "18", "19", "1", 
"2", "8", "9", "10", "11", "12", "16", "17", "18", "19", "1", 
"2", "8", "9", "10", "11", "12", "16", "17", "18", "19"), value = c("-1.55598", 
"-3.08452", "-0.720558", "-0.901471", NA, "-1.73362", "-1.27953", 
"0.147734", "-0.31916", "-0.48834", "-1.70071", NA, NA, NA, NA, 
NA, NA, NA, NA, "-17.0921", NA, NA, "-1.34066", "-3.05158", "-0.359577", 
"-0.921044", NA, "-1.71127", "-0.954832", "-0.44804", "-0.58607", 
"0.151555", "-0.656842", "-4.6299", "-5.97264", "-4.11533", "-4.24868", 
NA, "-4.26154", "-3.52369", "-2.58611", "-2.98512", "-2.37213", 
"-3.57149", "-2.05066", "-0.657222", "-1.40576", "-2.29293", 
"-0.509917", "-1.68802", NA, "-1.85783", "-1.9242", NA, "-2.33469", 
"-8.35787", "-9.52402", "-9.55285", "-9.5344", "-9.23144", "-9.94065", 
NA, "-8.96788", "-9.01785", "-9.17554", "-9.90749", "-2.06287", 
"-0.846725", "-1.08125", "-1.7152", "-2.01096", "-2.07493", NA, 
"-1.41699", "-1.471", NA, "-1.67149", "-3.83545", "-1.19723", 
"-1.78817", "-1.78302", NA, "-1.11688", NA, "-1.88749", "-2.20363", 
NA, "-1.79198"), X12 = structure(c(Gene_A = 1L, Gene_A = 1L, 
Gene_A = 1L, Gene_A = 1L, Gene_A = 1L, Gene_A = 1L, Gene_A = 1L, 
Gene_A = 1L, Gene_A = 1L, Gene_A = 1L, Gene_A = 1L, Gene_B = 1L, 
Gene_B = 1L, Gene_B = 1L, Gene_B = 1L, Gene_B = 1L, Gene_B = 1L, 
Gene_B = 1L, Gene_B = 1L, Gene_B = 1L, Gene_B = 1L, Gene_B = 1L, 
Gene_C = 1L, Gene_C = 1L, Gene_C = 1L, Gene_C = 1L, Gene_C = 1L, 
Gene_C = 1L, Gene_C = 1L, Gene_C = 1L, Gene_C = 1L, Gene_C = 1L, 
Gene_C = 1L, Gene_D = 1L, Gene_D = 1L, Gene_D = 1L, Gene_D = 1L, 
Gene_D = 1L, Gene_D = 1L, Gene_D = 1L, Gene_D = 1L, Gene_D = 1L, 
Gene_D = 1L, Gene_D = 1L, Gene_D.1 = 2L, Gene_D.1 = 2L, Gene_D.1 = 2L, 
Gene_D.1 = 2L, Gene_D.1 = 2L, Gene_D.1 = 2L, Gene_D.1 = 2L, Gene_D.1 = 2L, 
Gene_D.1 = 2L, Gene_D.1 = 2L, Gene_D.1 = 2L, Gene_E = 2L, Gene_E = 2L, 
Gene_E = 2L, Gene_E = 2L, Gene_E = 2L, Gene_E = 2L, Gene_E = 2L, 
Gene_E = 2L, Gene_E = 2L, Gene_E = 2L, Gene_E = 2L, Gene_F = 2L, 
Gene_F = 2L, Gene_F = 2L, Gene_F = 2L, Gene_F = 2L, Gene_F = 2L, 
Gene_F = 2L, Gene_F = 2L, Gene_F = 2L, Gene_F = 2L, Gene_F = 2L, 
Gene_G = 2L, Gene_G = 2L, Gene_G = 2L, Gene_G = 2L, Gene_G = 2L, 
Gene_G = 2L, Gene_G = 2L, Gene_G = 2L, Gene_G = 2L, Gene_G = 2L, 
Gene_G = 2L), .Label = c("M10.1", "M10.2"), class = "factor")), row.names = c(NA, 
-88L), class = "data.frame")

ADD REPLY • link 5.2 years ago by mohammedtoufiq91 ▴ 260

score 3 · Accepted Answer · 2019-11-27

3

Entering edit mode

5.2 years ago

zx8754 12k

We need to create data for dashed lines, then plot, see example, based on your data. I subsetted it for one facet for simplicity.

library(ggplot2)
library(dplyr)

# example data for one facet
Test_v1_ID$Timepoints <- factor(as.numeric(Test_v1_ID$Timepoints), levels = 1:20, labels = paste0("T", 1:20))
Test_v1_ID$value <- as.numeric(Test_v1_ID$value)
d <- Test_v1_ID[ Test_v1_ID$X12 == "M10.1", ]

# data for dashed lines
dash <- d %>% 
  arrange(Genes, Timepoints) %>% 
  group_by(Genes) %>% 
  mutate(x1 = if_else( is.na(value), lag(Timepoints), factor(NA)),
         x2 = if_else( is.na(value), lead(Timepoints), factor(NA)),
         y1 = if_else( is.na(value), lag(value), NA_real_),
         y2 = if_else( is.na(value), lead(value), NA_real_)) %>% 
  filter(!is.na(y1) & !is.na(y2))

# plot as before, and add dashed lines as segments
ggplot(d, aes(x = Timepoints, y = value, group = Genes, col = Genes)) + 
  geom_point() + 
  geom_line() +
  geom_segment(aes(x = x1, xend = x2, 
                   y = y1, yend = y2 ), data = dash, linetype = "dashed")

ADD COMMENT • link 5.2 years ago by zx8754 12k

0

Entering edit mode

Hi @zx8754,

I was trying a test run with your R code to understand further, however I keep getting the syntax errors. Am I missing something here.

dash <- d %>% 
  arrange(Genes, Timepoints) %>% 
  group_by(Genes) %>% 
  mutate(x1 = if_elseis.na(value), lag(Timepoints), factor(NA)),
x2 = if_elseis.na(value), lead(Timepoints), factor(NA)),
y1 = if_elseis.na(value), lag(value), NA_real_),
y2 = if_elseis.na(value), lead(value), NA_real_)) %>% 
  filter(!is.na(y1) & !is.na(y2))


dash <- d %>% 
+   arrange(Genes, Timepoints) %>% 
+   group_by(Genes) %>% 
+   mutate(x1 = if_elseis.na(value), lag(Timepoints), factor(NA)),
Error: unexpected ',' in:
"  group_by(Genes) %>% 
  mutate(x1 = if_elseis.na(value), lag(Timepoints), factor(NA)),"
> x2 = if_elseis.na(value), lead(Timepoints), factor(NA)),
Error: unexpected ',' in "x2 = if_elseis.na(value),"
> y1 = if_elseis.na(value), lag(value), NA_real_),
Error: unexpected ',' in "y1 = if_elseis.na(value),"
> y2 = if_elseis.na(value), lead(value), NA_real_)) %>% 
Error: unexpected ',' in "y2 = if_elseis.na(value),"
>   filter(!is.na(y1) & !is.na(y2))
Error in filter(!is.na(y1) & !is.na(y2)) : object 'y1' not found
> 




str(d)
'data.frame':   44 obs. of  4 variables:
 $ Genes     : Factor w/ 8 levels "Gene_A","Gene_B",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Timepoints: Factor w/ 20 levels "T1","T2","T3",..: 1 2 8 9 10 11 12 16 17 18 ...
 $ value     : num  -1.556 -3.085 -0.721 -0.901 NA ...
 $ X12       : Factor w/ 2 levels "M10.1","M10.2": 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "names")= chr  "Gene_A" "Gene_A" "Gene_A" "Gene_A" ...

ADD REPLY • link 5.2 years ago by mohammedtoufiq91 ▴ 260

1

Entering edit mode

Try again.

(biostars website sometimes doesn't render code properly, and some parenthesis disappear, should be ok, now.)

ADD REPLY • link 5.2 years ago by zx8754 12k

0

Entering edit mode

@ zx8754 ,

Excellent. Looks great! Thank you.

In my case, I should specify each ID manually as I have 100 of them as below

d <- Test_v1_ID[ Test_v1_ID$X12 == "M10.1", "M10.2", "M10.3", "M10.4", "M10.9", ......... ]

or just any easier method?

ADD REPLY • link 5.2 years ago by mohammedtoufiq91 ▴ 260

1

Entering edit mode

We will need to use loops.

Please avoid asking new questions in the comments, if it is a new question then post a question.
Try searching the webs for simple R problems.

ADD REPLY • link 5.2 years ago by zx8754 12k

0

Entering edit mode

@ zx8754 ,

Thank you very much for the assistance.

ADD REPLY • link 5.2 years ago by mohammedtoufiq91 ▴ 260