Using forloop to add headers to multiple files
3
1
Entering edit mode
5.8 years ago

Hi guys! I need help in setting a for loop in R (I'm quite new programming in R). I would like to add the same header to all the files that match a concrete pattern inside a folder.

To get the list of files I'm using the following code:

filelist <- list.files(pattern = "DESeq2_result*")

And this is the for loop I am trying to implement:

for (i in seq_along(filelist)) {
   names[[i]] <- a
   out [i]

}

where a is a vector that I defined with the names of the different columns:

a <- c("gene_id", "baseMean", "log2FC",
       "SD", "WaldStatistic", "pval", "padj")

If you have any tutorial/page to help me to learn and practise my ability to code functions and loops in R would be so much appreciated.

Thank you so much in advance!

Jordi

R • 3.8k views
ADD COMMENT
1
Entering edit mode

What error do you get? What is it that you would like to get out of the loop? The same files but now with change of header?

ADD REPLY
4
Entering edit mode
5.8 years ago

As I understood, you want to open some files containing the pattern "DESeq2_result*" as data.frames in a list, assigning new names to the columns of these dataframes based on the a vector, and maybe name the elements of the list based on the files name? You can do it that way:

filelist <- list.files(pattern = "DESeq2_result")

a <- c("gene_id", "baseMean", "log2FC",
       "SD", "WaldStatistic", "pval", "padj")

data<-list()

for (i in 1:length(filelist)) {
  data[[i]]<-read.table(filelist[i])
  colnames(data[[i]])<-a
}

names(data)<-filelist

for (i in 1:length(data)) {
    output_name <- paste0("colnames_",names(data[i]))
    write.table(data[[i]], output_name, quote = F, row.names = F, col.names = T, sep="\t")
}

EDIT: added the write.table part to take your comment into account in order to export the imported files with the column names.

ADD COMMENT
2
Entering edit mode

Basically, what i want to do is to add column names to my files, in one piece of code. For a single file, the code would be:

data <- read.table("file", sep = "\t", header = FALSE)
names(data)
colnames(data) <- a
write.table(data, "file", sep = "\t", col.names = TRUE, row.names = FALSE)

I'm trying to optimise my code, so I can do all of them without having to type in my Rmarkdown document 8 times the same code (one for each file)

ADD REPLY
1
Entering edit mode

Ok, I modified the code so it does what you want.

I think that a much better way would be to simply replace the first line of every file with a shell script or a one-liner. You don't need R for that. The code above is really heavy for what you need. Pasting the first line stored in file a at the beginning of every file with pattern would be much more efficient, but not R-based.

ADD REPLY
1
Entering edit mode

Ok thank you so much! So basically you are suggesting me to do this with a shell for loop and sed/awk command instead. Thank you so much again for your help! Cheers!

ADD REPLY
4
Entering edit mode
5.8 years ago
zx8754 12k

Use col.names = argument when reading the files, then write out, something like this:

for(i in list.files(pattern = "DESeq2_result_*"))
  write.table(read.table(i, col.names = c("gene_id", "baseMean", "log2FC",
                                          "SD", "WaldStatistic", "pval", "padj")), i)

Note: this overwrites existing files, to create new files:

for(i in list.files(pattern = "DESeq2_result_*"))
  write.table(read.table(i, col.names = c("gene_id", "baseMean", "log2FC",
                                          "SD", "WaldStatistic", "pval", "padj")),
              paste0(i, ".fixed.txt"))
ADD COMMENT
1
Entering edit mode

Thank you a lot!! It is really appreciated

ADD REPLY
1
Entering edit mode
5.8 years ago

Creates a new file with headers. Change the gsub pattern before using.

col=c("gene_id", "baseMean", "log2FC","SD", "WaldStatistic", "pval", "padj")
for(file in list.files(pattern="^DESeq2_result*")){
  write.table(read.table(file,col.names = col),
              paste0("updated_",gsub(pattern = "\\.txt$", "", file),".tsv"),col.names = TRUE, row.names = FALSE,sep="\t") #Also update the file extension
}
ADD COMMENT
1
Entering edit mode

Thank you too, arup!

ADD REPLY

Login before adding your answer.

Traffic: 2450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6