Running pathfindR package generates an error message
4.2 years ago
peter.berry5 ▴ 60

I'm using the pathfindR package to perform PSEA on a set of DE proteins from a proteomics experiment. If I understand the package correctly one of the outputs should be a dataframe which I have called PSEA_results.

When I run the package that dataframe doesn't get created and I get the following error message.

Error in (function (filename = "Rplot%03d.png", width = 480, height = 480, : invalid 'filename'

This is a new one on me. My code is below.

library ("KEGGREST")
library ("KEGGgraph")
library ("")
library ("pathfindR") 
library ("tidyverse")
library ("tidyr")
library ("dplyr")
library ("magrittr")

#remove all previous file

#data summary needs pvalues added

#Data Summary of Progenesis Data, Enriched GO Terms and Enriched KEGG Pathways
attached_CHO_all_data <- read.csv("C:/PETER PROJECT/3d. Mass Spec Processing/Results of R Analysis/Data Summary/Effect of Culture Format/WCL Analysis/DE Cutoff equal 1.2/0% Attached v 0% Suspension/DE 1-2 0% Attached v Susp WCL Apr19 (CHO Database) - All Data.csv")

A_v_S_Comp_0rd <- read.csv("C:/PETER PROJECT/3d. Mass Spec Processing/Results of R Analysis/DE Normalised Abundance Data/Effect of Culture Format/WCL Analysis/DE Cutoff equal 1.2/0% Attached v 0% Suspension/DE 1-2 0% Attached v Susp Normalised Abundance Data WCL Apr19.csv")

#find the average of each row relating to the 4 replicates of attached cells
Aver_A0 <- rowMeans(data.frame(A_v_S_Comp_0rd$A0b, A_v_S_Comp_0rd$A0c, A_v_S_Comp_0rd$A0d))
Aver_A0 <- data.frame(A_v_S_Comp_0rd$description, Aver_A0)

#find the average of each row relating to the 4 replicates of suspension cells
Aver_S0 <- rowMeans(data.frame(A_v_S_Comp_0rd$S0a, A_v_S_Comp_0rd$S0b, A_v_S_Comp_0rd$S0c, A_v_S_Comp_0rd$S0d))
Aver_S0 <- data.frame(A_v_S_Comp_0rd$description, Aver_S0)

#Divide each row of dataframe Aver_S5 by the corresponding row in dataframe Aver_A5 
FC <- cbind(Aver_S0[1],round(Aver_S0[-1]/Aver_A0[-1],3))
names(FC)[2] = "FC"

#Get Log2 0f A0/S0            
Log2FC = log2(FC$FC)

#select desired columns
PSEA_1<-  data.frame(A_v_S_Comp_0rd$description, Log2FC, A_v_S_Comp_0rd$Anova..p)

#spilt column 1 in the dataframe PSEA_rd into 2 separate columns using the phrase refseq: as the spilt and call it PSEA_2
PSEA_2<- tidyr::separate (PSEA_1, 1, into=c("HGNC_Symbol", "Refseq"), sep = "Refseq:")

#further spilt column 6 in the dataframe PSEA_2 into 2 separate columns using = as the spilt and call it v7
PSEA_3<- tidyr::separate (PSEA_2, 1, into=c("gene", "HGNC_Symbol"), sep = "=") 

#remove unwanted columns
PSEA_4 = subset(PSEA_3, select = -c(gene,Refseq))
PSEA_rd <- distinct(PSEA_4,HGNC_Symbol, .keep_all= TRUE)

#trim white space from protein names and make a dataframe
PSEA_rd$HGNC_Symbol <- trimws(PSEA_rd$HGNC_Symbol, which = "right", whitespace = "[ \t\r\n]")

#rename columns and sort on Gene.Symbol
names(PSEA_rd)[1] <- "Gene.Symbol"
names(PSEA_rd)[2] <- "LogFC"
names(PSEA_rd)[3] <- "adj.P.Val"
PSEA_rd <- arrange(PSEA_rd, Gene.Symbol, by_group = FALSE)

# PSEA_rd is the raw data file for PSEA

#set a directory to store the results in. The default directory is is pathfindeR_Results

PSEA_results <- run_pathfindR(PSEA_rd, output_dir = ("C:/PETER PROJECT/3d. Mass Spec Processing/Results of R Analysis/pathfindR/Culture Format/WCL/0% Attach v 0% Susp")) 

    #write.csv (PSEA_results, file = "C:/PETER PROJECT/3d. Mass Spec Processing/Results of R Analysis/pathfindR/Effect of Culture Format/WCL Analysis/0% Attached v 0% Suspension/DE Cutoff equal 1.2/PSEA_Attached_0.csv")

The data frame PSEA_rd has over 1000 lines in it so I don't propose to reproduce it here. However, just so people can see the types of data it contains the output from str(PSEA_rd) is below.

'data.frame':   1171 obs. of  3 variables:
 $ Gene.Symbol: chr  "AAAS" "AACS" "AARS" "AATF" ...
 $ LogFC      : num  -0.648 0.82 1.837 -0.492 -1.114 ...
 $ adj.P.Val  : num  7.15e-05 4.59e-03 4.71e-05 1.38e-02 1.00e-02 ...

I did try one other thing. I ran

PSEA_results <- run_pathfindR(PSEA_rd)

as per this post Enrichment Analysis, Clustering and Scoring with pathfindR. The result was the same error message which hopefully rules out that the issue is caused by the directory structure I want to store the data in.

Thanks in advance for any suggestions..

pathfindR R
4.2 years ago
peter.berry5 ▴ 60

This can be closed. I've managed to fix the problem. The columns in the PSEA_rd dataframe needed to be called

Gene.symbol not Gene.Symbol

and logFC not LogFC


