How to plot a multiple line graph with Mean and Std Error for following dataset?
0
2
Entering edit mode
6.6 years ago
WUSCHEL ▴ 810

I have a data set of >100 different samples. Samples are from different genotypes (e.g. X, Y, Z) and 4 different time points (T0,1,2,3) with 3 biological replicates (R1,2,3). I'm measuring values for 50 different genes (in raws)

structure(list(Gene = structure(1:2, .Label = c("A", "B"), class = "factor"), 
X_T0_R1 = c(1.46559502, 0.220140568), X_T0_R2 = c(1.087642983, 
0.237500819), X_T0_R3 = c(1.424945196, 0.21066267), X_T1_R1 = c(1.289943948, 
0.207778662), X_T1_R2 = c(1.376535013, 0.488774258), X_T1_R3 = c(1.833390311, 
0.182798731), X_T2_R1 = c(1.450753714, 0.247576125), X_T2_R2 = c(1.3094609, 
0.390028842), X_T2_R3 = c(0.5953716, 1.007079177), X_T3_R1 = c(0.7906009, 
0.730242116), X_T3_R2 = c(1.215333041, 1.012914813), X_T3_R3 = c(1.069312467, 
0.780421013), Y_T0_R1 = c(0.053317766, 3.316414959), Y_T0_R2 = c(0.506623748, 
3.599442788), Y_T0_R3 = c(0.713670106, 2.516735845), Y_T1_R1 = c(0.740998252, 
1.444496448), Y_T1_R2 = c(0.648231834, 0.097957459), Y_T1_R3 = c(0.780499252, 
0.187840968), Y_T2_R1 = c(0.35344654, 1.190274584), Y_T2_R2 = c(0.220223951, 
1.367784148), Y_T2_R3 = c(0.432856978, 1.403057729), Y_T3_R1 = c(0.234963735, 
1.232129062), Y_T3_R2 = c(0.353770497, 0.885122768), Y_T3_R3 = c(0.396091395, 
1.333921747), Z_T0_R1 = c(0.398000559, 1.286528398), Z_T0_R2 = c(0.384759325, 
1.122251177), Z_T0_R3 = c(1.582230097, 0.697419716), Z_T1_R1 = c(1.136843842, 
0.804552001), Z_T1_R2 = c(1.275683837, 1.227821594), Z_T1_R3 = c(0.963349308, 
0.968589683), Z_T2_R1 = c(3.765036263, 0.477443352), Z_T2_R2 = c(1.901023385, 
0.832736132), Z_T2_R3 = c(1.407713024, 0.911920317), Z_T3_R1 = c(0.988333629, 
1.095130142), Z_T3_R2 = c(0.618606729, 0.497458337), Z_T3_R3 = c(0.429823986, 
    0.471389536)), .Names = c("Gene", "X_T0_R1", "X_T0_R2", "X_T0_R3", 
"X_T1_R1", "X_T1_R2", "X_T1_R3", "X_T2_R1", "X_T2_R2", "X_T2_R3", 
"X_T3_R1", "X_T3_R2", "X_T3_R3", "Y_T0_R1", "Y_T0_R2", "Y_T0_R3", 
"Y_T1_R1", "Y_T1_R2", "Y_T1_R3", "Y_T2_R1", "Y_T2_R2", "Y_T2_R3", 
"Y_T3_R1", "Y_T3_R2", "Y_T3_R3", "Z_T0_R1", "Z_T0_R2", "Z_T0_R3", 
"Z_T1_R1", "Z_T1_R2", "Z_T1_R3", "Z_T2_R1", "Z_T2_R2", "Z_T2_R3", 
"Z_T3_R1", "Z_T3_R2", "Z_T3_R3"), class = "data.frame", row.names = c(NA, 
-2L))

For each gene (i.e. for each column), I want to plot a graph with an average of replicates of each genotype + SE Expected line graph pattern plot with SE

E.g. 1E.g. 2

i.e. for an e.g. For Gene A, at a particular time point (0/1/3/5) I want to draw a graph with all the genotypes (X, Y, Z); should be the3 lines in the plot looking like above plots.

How is this possible using R? How can I include Std Error? Using loops can I generate 50 graphs (for each raw separate graph)?

R • 16k views
ADD COMMENT
1
Entering edit mode

Values you furnished above large deviations. See if following plot works. Data is taken from OP:

df=read.csv("df1.txt", sep="\t",stringsAsFactors = F)
library(tidyr)
df1=gather(df,"TP","Values",-Gene)
library(stringr)
df2=cbind(df1,str_split_fixed(df1$TP,"_",3))
colnames(df2)[4:6]=c("genotype","time","replicate")
library(Rmisc)
df4=summarySE(df2, measurevar="Values", groupvars=c("time","Gene","genotype"))

  ggplot(df4, aes(time, Values, group = genotype, color = genotype)) +
  geom_line() +
  geom_point() +
  facet_wrap( ~ Gene) +
  labs(title = "Gene expression over 16 hr", x = "Time (hr)", y = "Measurement") +
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20),
    strip.text = element_text(size = 20),
    axis.title.y = element_text(size = 20),
    axis.title.x = element_text(size = 20),
    axis.text.x = element_text(size = 14),
    axis.text.y = element_text(size = 14)
  ) +
  geom_ribbon(aes(ymax = Values + sd, ymin = Values - sd),
              alpha = 0.5,
              fill = "grey70",
              colour=NA
              )

Rplot01

ADD REPLY
0
Entering edit mode

Thank you cpad0112

I'm working with your codes. Im having a error after df4=summarySE(df2, measurevar="Values", groupvars=c("time","Gene","genotype"))

Error in summarySE(df2, measurevar = "Values", groupvars = c("time", "Gene", : could not find function "summarySE"

Could you please help me with this

ADD REPLY
0
Entering edit mode

sorry..forgot to add following line: library(Rmisc). SummarySE function is from Rmisc library. Load Rmisc library. Updated the code.

ADD REPLY
0
Entering edit mode

Thanks, cpad0112 , Sorry to bother I get another error message

Error in combine_vars(data, params$plot_env, vars, drop = params$drop) : At least one layer must contain all variables used for facetting

Would appreciate your help :)

ADD REPLY
0
Entering edit mode

well, if you could post script here (ggplot) it would help. Check if you have recent versions of ggplot.

ADD REPLY
0
Entering edit mode

I see, I do not have ggplot. Is this different form ggplot2.

Where can I get this package? Googling doesn't help :(

df <- read.csv("SI_AVG_Line.csv")

library(tidyr) df1 <- gather(df,"Transitions","Values",-Targets) library(stringr) df2 <- cbind(df1,str_split_fixed(df1$Transitions,"_",3)) colnames(df2)[4:6]=c("genotype","time","replicate") library(Rmisc) df4 <- summarySE(df2, measurevar="Values", groupvars=c("time","Targets","genotype")) ggplot(df4, aes(time, Values, group = genotype, color = genotype)) + geom_line() + geom_point() + facet_wrap( ~ Gene) + labs(title = "Gene Expression vs time", x = "Time (d)", y = "Area_counts") + theme_linedraw() + theme( plot.title = element_text(hjust = 0.5, size = 20), strip.text = element_text(size = 20), axis.title.y = element_text(size = 20), axis.title.x = element_text(size = 20), axis.text.x = element_text(size = 14), axis.text.y = element_text(size = 14) ) + geom_ribbon(aes(ymax = Values + sd, ymin = Values - sd), alpha = 0.5, fill = "grey70", colour=NA )

@ cpad0112 Given below is the working .csv file where error comes , if in a case this is the problem!

ADD REPLY
0
Entering edit mode
structure(list(Targets = c("A", "B", "C", "nor"), X_S1_0d_1 = c(1.940487232,

1.079594087, 1.459871602, 1), X_S1_0d_2 = c(1.940630815, 0.790986517, 0.836386383, 1), X_S1_0d_3 = c(2.05030161, 0.721115111, 0.802144144, 1), X_S1_1d_1 = c(0.927368618, 1.186737277, 0.765095737, 1), X_S1_1d_2 = c(1.159347963, 1.427045976, 1.196499915, 1), X_S1_1d_3 = c(1.009271935, 1.049367585, 0.748728559, 1), X_S1_3d_1 = c(0.794781558, 1.072762904, 1.288591327, 1), X_S1_3d_2 = c(0.698642658, 0.971534921, 0.923846091, 1), X_S1_3d_3 = c(0.938922191, 0.80228642, 1.433899521, 1), X_S1_5d_1 = c(0.768844884, 1.458863535, 0.880239008, 1), X_S1_5d_2 = c(0.586314866, 1.027767798, 0.831469797, 1), X_S1_5d_3 = c(0.604124099, 1.502330028, 1.101895903, 1), mut2_S1_0d_1 = c(2.085432338, 0.861943427, 0.509210189, 1), mut2_S1_0d_2 = c(1.774970153, 1.074569974, 3.128664718, 1), mut2_S1_0d_3 = c(2.003870102, 0.753483213, 1.047020362, 1), mut2_S1_1d_1 = c(1.168381858, 1.15001272, 0.580462548, 1), mut2_S1_1d_2 = c(1.33284456, 0.450460567, 0.959430252, 1), mut2_S1_1d_3 = c(1.106332747, 0.466636391, 0.660254618, 1), mut2_S1_3d_1 = c(0.859543853, 1.188445442, 1.044546139, 1), mut2_S1_3d_2 = c(1.022929555, 1.259366417, 1.776709656, 1), mut2_S1_3d_3 = c(0.917527143, 2.137370791, 0.669765284, 1), mut2_S1_5d_1 = c(0.642810843, 0.496709803, 0.801885112, 1), mut2_S1_5d_2 = c(0.879777521, 1.170165217, 1.793443182, 1), mut2_S1_5d_3 = c(0.816650769, 0.864352103, 0.768312731, 1), mut5_S1_0d_1 = c(1.936291138, 0.721197246, 1.885982652, 1), mut5_S1_0d_2 = c(2.136240851, 0.925363277, 0.282462799, 1), mut5_S1_0d_3 = c(1.986120429, 0.677085837, 0.124936834, 1), mut5_S1_1d_1 = c(1.346339786, 0.989266319, 1.396700558, 1), mut5_S1_1d_2 = c(1.489199506, 1.269083963, 1.48921516, 1), mut5_S1_1d_3 = c(1.584229502, 0.88246637, 2.25267634, 1), mut5_S1_3d_1 = c(0.755948531, 1.451613602, 0.898362008, 1), mut5_S1_3d_2 = c(0.824308907, 0.5962476, 0.523055204, 1), mut5_S1_3d_3 = c(0.753359409, 0.753222103, 0.948441646, 1), mut5_S1_5d_1 = c(0.788525215, 1.85338769, 0.951693842, 1), mut5_S1_5d_2 = c(1.010417043, 1.983625345, 1.086768544, 1), mut5_S1_5d_3 = c(0.630454563, 1.439599004, 1.416591771, 1), mut7_S1_0d_1 = c(1.672072567, 0.611243763, 0.705364938, 1), mut7_S1_0d_2 = c(1.738837658, 0.503828595, 0.499147343, 1), mut7_S1_0d_3 = c(2.149037252, 1.192787265, 1.226895377, 1), mut7_S1_1d_1 = c(1.421761015, 1.084490092, 0.497815065, 1), mut7_S1_1d_2 = c(1.068782794, 0.584950798, 0.38078948, 1), mut7_S1_1d_3 = c(1.229045044, 0.822348277, 0.449995849, 1), mut7_S1_3d_1 = c(0.890386073, 0.802513638, 0.757190729, 1), mut7_S1_3d_2 = c(1.022619118, 0.806565748, 0.645204575, 1), mut7_S1_3d_3 = c(0.80348663, 0.753993198, 0.593817113, 1), mut7_S1_5d_1 = c(0.780575903, 0.724989068, 0.92248483, 1), mut7_S1_5d_2 = c(0.743592574, 1.279872561, 1.201887432, 1), mut7_S1_5d_3 = c(0.522476113, 0.751493063, 0.899865367, 1), mut9_S1_0d_1 = c(1.247510942, 0.762934403, 2.009134613, 1), mut9_S1_0d_2 = c(1.159843529, 0.684622155, 0.499925077, 1), mut9_S1_0d_3 = c(1.247510942, 0.762934403, 2.205521099, 1), mut9_S1_1d_1 = c(1.139288266, 0.530593446, 0.767442607, 1), mut9_S1_1d_2 = c(1.257958733, 0.780701299, 0.77153391, 1), mut9_S1_1d_3 = c(1.230762109, 0.536139676, 0.742313942, 1), mut9_S1_3d_1 = c(0.809093089, 0.59528538, 0.804481151, 1), mut9_S1_3d_2 = c(0.853017549, 0.826757331, 1.141960538, 1), mut9_S1_3d_3 = c(0.813029821, 0.748971384, 1.964723247, 1), mut9_S1_5d_1 = c(0.797277294, 1.327830526, 0.943500196, 1), mut9_S1_5d_2 = c(0.669946954, 1.011869145, 0.979867227, 1), mut9_S1_5d_3 = c(0.525670301, 1.067407334, 0.76001394, 1)), .Names = c("Targets", "X_S1_0d_1", "X_S1_0d_2", "X_S1_0d_3", "X_S1_1d_1", "X_S1_1d_2", "X_S1_1d_3", "X_S1_3d_1", "X_S1_3d_2", "X_S1_3d_3", "X_S1_5d_1", "X_S1_5d_2", "X_S1_5d_3", "mut2_S1_0d_1", "mut2_S1_0d_2", "mut2_S1_0d_3", "mut2_S1_1d_1", "mut2_S1_1d_2", "mut2_S1_1d_3", "mut2_S1_3d_1", "mut2_S1_3d_2", "mut2_S1_3d_3", "mut2_S1_5d_1", "mut2_S1_5d_2", "mut2_S1_5d_3", "mut5_S1_0d_1", "mut5_S1_0d_2", "mut5_S1_0d_3", "mut5_S1_1d_1", "mut5_S1_1d_2", "mut5_S1_1d_3", "mut5_S1_3d_1", "mut5_S1_3d_2", "mut5_S1_3d_3", "mut5_S1_5d_1", "mut5_S1_5d_2", "mut5_S1_5d_3", "mut7_S1_0d_1", "mut7_S1_0d_2", "mut7_S1_0d_3", "mut7_S1_1d_1", "mut7_S1_1d_2", "mut7_S1_1d_3", "mut7_S1_3d_1", "mut7_S1_3d_2", "mut7_S1_3d_3", "mut7_S1_5d_1", "mut7_S1_5d_2", "mut7_S1_5d_3", "mut9_S1_0d_1", "mut9_S1_0d_2", "mut9_S1_0d_3", "mut9_S1_1d_1", "mut9_S1_1d_2", "mut9_S1_1d_3", "mut9_S1_3d_1", "mut9_S1_3d_2", "mut9_S1_3d_3", "mut9_S1_5d_1", "mut9_S1_5d_2", "mut9_S1_5d_3"), class = "data.frame", row.names = c(NA, -4L))

ADD REPLY
1
Entering edit mode

Issue with the code is here: df2 <- cbind(df1,str_split_fixed(df1$Transitions,"_",3)). When you split the string (example: mut9_S1_3d_2) with _, you would get 4 strings and you are selecting 3. I meant ggplot2 not ggplot.

code:

library(tidyr)
df1=gather(test,"TP","Values",-Targets)

library(stringr)
df2=cbind(df1,str_split_fixed(df1$TP,"_",4))
colnames(df2)[4:7]=c("a","b","time","replicate")
df2$mut = paste(df2$a, df2$b, sep="_")

library(dplyr)
df3=select(df2, -c(a,b))

library(Rmisc)
names(df3)
df4=summarySE(df3, measurevar="Values", groupvars=c("time","mut","Targets"))
View(df4)
View(df)

library(ggplot2)
ggplot(df4, aes(time, Values, group = mut, color = mut)) +
  geom_line() +
  geom_point() +
  facet_wrap( ~ Targets) +
  labs(title = "Gene expression ", x = "Time (hr)", y = "Measurement") +
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20),
    strip.text = element_text(size = 20),
    axis.title.y = element_text(size = 20),
    axis.title.x = element_text(size = 20),
    axis.text.x = element_text(size = 14),
    axis.text.y = element_text(size = 14)
  ) +
  geom_ribbon(aes(ymax = Values + sd, ymin = Values - sd),
              alpha = 0.3,
              fill = "grey70",
              colour=NA
  )

Rplot01

with error bar only (mean +/- SD):

Rplot02_errorbar

For error bars, replace following code in the code with one that is highlighted in red below:

geom_ribbon(aes(ymax = Values + sd, ymin = Values - sd),
              alpha = 0.3,
              fill = "grey70",
              colour=NA
  )

with geom_pointrange(aes(ymax=Values+sd, ymin=Values-sd))

ADD REPLY
0
Entering edit mode

Thank you cpad0112 Thank you for helping me this much even though you do not know me personally. I will never forget your help. When the right time comes I promise I'll acknowledge all the people help me without knowing me. I wish if I know who you are (being anonymous is the best for me like dummy only). I appreciate your time and kindness!!! Wishing you the best!!!

ADD REPLY
0
Entering edit mode

No problem and thank you. Which ever post helped you in resolving issue, mark it as answer. Good luck with your research.

Note: When one gets a help/suggestion from Biostars, it the help/suggestion from forum (Biostars) and vice versa is also true. When one asks a questions/suggestion/issue, it is not individual one, It is a subject/knowledge related issue. If you like/wish to acknowledge any help/suggestion from this forum, you can cite/acknowledge forum in technical writings (thesis/manuscript/presentation/abstract etc). Please contact admins for instructions on how to cite biostars forum. Please note that acknowledgement/citation mentioned above is mere a suggestion, not mandatory.

ADD REPLY

Login before adding your answer.

Traffic: 1685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6