R add r-squared and p-value
1
0
Entering edit mode
5.4 years ago

Hello, I have created an R code which in a folder puts the graphs of the linear regression on me, but I would like to show on these graphs the r-squared and the p-value of each lm(with NAs) . How to do this?

library(ggplot2)
library(ggpmisc)

setwd(work)
files <- list.files(path = "data", pattern = (".csv$"))

for (k in 1:length(files)) {
  fname <- files[k]
  cat(paste0("Now analyse data/", fname, "...\n"))
  data <- read.csv2(paste0("data/", fname), header = T, stringsAsFactors = F, dec = ",")
  setwd(graphe)  #faire 1dossier par fichier 
  newdir <- paste0(fname)
  dir.create(newdir)

    for(i in 2:ncol(data)){
      cwd <- getwd()
      setwd(newdir)

           jpeg(paste(i, "jpeg", sep = "."), width = 15, height =12, units="cm", quality=75, res=300)
      p <- ggplot(data) + 
        geom_point(aes_string(y = "score", x = colnames(data)[i]))+
        geom_smooth(aes_string(y = "score", x = colnames(data)[i]),method=lm, colour="red", fill="red", alpha=0.25)+
      theme_classic ()
      print(p)
      dev.off()
      setwd(cwd)
    }
    setwd(work)
    }

Thank you in advance

R • 5.9k views
ADD COMMENT
2
Entering edit mode
  1. Changing working directories within code is unnecessary, even dangerous. You have no reason to change working directories here.
  2. Why the parentheses in pattern=(".csv"$)?
  3. The line newdir <- paste0(fname) makes no sense and is not required. newdir <- fname should suffice.
  4. Saying this again because it makes the script ugly and difficult to follow - why the constant working directory change in loops?
  5. Why jpeg() and dev.off() instead of the more elegant ggsave()?
  6. Why save the ggplot() object to p only to print() it? Why not just call ggplot() and be done with it?
ADD REPLY
0
Entering edit mode

Thank you for your answer (My data has NAs)

  1. As I want to form the plots (among other things) in specific places, I thought that the best way was to always change the directory.

  2. the brackets were advised by my supervisor. As I'm a beginner in R, I followed everything he told me.

  3. Okay, thanks, I didn't know.

  4. As explained above, I regularly change directories to save the plots, etc. in a specific place.

  5. I found the jpeg and dev of technique on the internet because I didn't understand well how to record with ggsave.

  6. by using ggsave, I think we can remove the print step

ADD REPLY
1
Entering edit mode

Everyone starts out with bad code. If not for that, we could never start learning. You're already doing things right, you only need to build consistency. For example, see how the read.csv2 uses paste0() to read a file within a sub-folder? The same thing can be done with everything else.

library(ggplot2)
library(ggpmisc)

## Let's say you're in /work
work_dir <- '/work';
setwd(work_dir); #This should not be done but we can leave it be for now

## Now you wish to list all /work/data/*.csv
files <- list.files(path = "data", pattern = ".csv$")

for (k in 1:length(files)) {
  fname <- files[k]
  cat(paste0("Now analyse data/", fname, "...\n"))
  # Do not use 'data' as a variable name as it is a built in function. It's bad coding practice
  my_data <- read.csv2(paste0("data/", fname), header = TRUE, stringsAsFactors = FALSE, dec = ",")

  ## Let's say you wish to save all plots under /work/graphe/file_name/

  #setwd(graphe)  #faire 1dossier par fichier 
  newdir <- fname;

  graphe_outdir <- paste(work_dir, 'graphe', newdir, sep = "/")
  dir.create(graphe_outdir); # Why are you creating directories named ".csv"?

  for(i in 2:ncol(my_data)){
      # No need to switch working directories to save file to a different directory
      # filename param of jpeg() can create files at any specified location that it has permission to
      jpeg(paste0(graphe_outdir, '/', i, ".jpeg"), width = 15, height =12, units="cm", quality=75, res=300)
      ggplot(my_data) + 
        geom_point(aes_string(y = "score", x = colnames(my_data)[i]))+
        geom_smooth(aes_string(y = "score", x = colnames(my_data)[i]),method=lm, colour="red", fill="red", alpha=0.25)+
        theme_classic()
     dev.off(); #Sorry, forgot this line earlier
    }
    #setwd(work)
}
ADD REPLY
0
Entering edit mode

thank you for your proposal, but R does not recognize the different folders and files.I will take into account your comments and try to improve my code, thank you

ADD REPLY
0
Entering edit mode

R does not recognize the different folders and files

Sorry, what? That does not make sense. If there is a particular error message, we can figure out what's going on.

ADD REPLY
0
Entering edit mode

It was actually a mistake on my part the folders create well but you still have to put the ggplot in a variable and then print it because otherwise no graph can be saved. Thank you, I will be able to make my script clearer. Thank you for your time.

edit : and I don't want to create a folder with the name of csv but with the name of the file without the extension

ADD REPLY
0
Entering edit mode

Sorry, I'd forgotten a dev.off() line (which can be avoided if ggsave() is used instead of jpeg() .. dev.off(). I don't think you'd need to save the plot to a variable. ggplot() draws the plot to the active device automatically.

There should be some function to get the file's basename without extension from the full filename. See the tools package.

ADD REPLY
7
Entering edit mode
5.4 years ago
davidtmscott ▴ 90

If you use ggscatter from ggpubr package you can add a regression line and then add its equation and r^2 and p values with stat_cor() and stat_regline_equation().

See here some basic code you can adapt and add to.

  p <- ggscatter(i, x = colnames(data)[i], y = "score",
size = 0.5,
add = "reg.line", conf.int = T,
add.params = list(color = "black", fill = "grey", size = 0.75), 
title = NULL, show.legend.text = FALSE ) + 
stat_cor() +
stat_regline_equation()

Alternatively you could just run the linear model separately, save the values as variables and then add them to the plot as a label.

Hope this helps.

ADD COMMENT
0
Entering edit mode

thank you for your proposal, but R shows me an error: Error in data[, x] : incorrect number of dimensions

edit : Sorry, you were right! THANK YOU! I made a mistake by leaving i in the ggscatter function when you had to put data. Thank you for your help with this function.

ADD REPLY

Login before adding your answer.

Traffic: 2194 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6