Removing columns from multiple files in R
2
0
Entering edit mode
4.3 years ago

I have written a code for removing columns from multiple text files.

file_list <- list.files(pattern = ".txt")
 dfList <- lapply(file_list, function(f) {
     df <- read.table(f, header=TRUE, sep="\t", stringsAsFactors=FALSE)
     df <- df[grep("(adj.P.Val|P.Value|t|B)", names(df), invert = TRUE)]
 })

 finaldf <- do.call(rbind, dfList)

But i need multiple files as it is. rbind merges all the file. How to remove only columns from all the files keeping as it is? Thanks in advance

R microarray • 2.9k views
ADD COMMENT
0
Entering edit mode

After the lapply command you should end up with a list of data.frames that each only have the selected columns. What did you want to do with this list exactly?

ADD REPLY
0
Entering edit mode

I want to remove columns from the files and keep it as it is.i want them separate only

ADD REPLY
0
Entering edit mode

So do you want to save them as new files then?

ADD REPLY
0
Entering edit mode

yes but all should be separate. not in one

ADD REPLY
3
Entering edit mode
4.3 years ago

I use the walk function here from the tidyverse library purrr because we don't care about returning a list from the loop.

library("purrr")
library("dplyr")

file_list <- list.files(pattern = "\\.txt$")

walk(file_list, function(f) {
  file_name <- paste0("filtered_", basename(f))
  read.table(f, header=TRUE, sep="\t", stringsAsFactors=FALSE) %>%
    select(-adj.P.Val, -P.Value, -t, -B) %>%
    write.table(file_name, sep="\t", col.names=TRUE, row.names=FALSE, quote=FALSE)
})
ADD COMMENT
0
Entering edit mode

but it is printing same columns and it is not printing whole file. what to do ?

ID  adj.P.Val   P.Value t   B   logFC   Gene.symbol
8156043 7.92e-08    4.54e-12    -35.8   17.12755    -1.87   PSAT1
8114249 7.92e-08    1.63e-11    -31.5   16.233  -2  CXCL14
7912975 9.5e-08 2.45e-11    -30.3   15.93407    -1.33   ALDH4A1
ADD REPLY
0
Entering edit mode

I edited my answer so that it should work based on your regex. If it doesn't work, can you post the names of the columns that you want to remove here?

ADD REPLY
0
Entering edit mode

"adj.P.Val" "P.Value" "t" "B"

Thanks its working. The previous one is also working

ADD REPLY
0
Entering edit mode

Can you edit this to only include the column names you want to remove?

ADD REPLY
0
Entering edit mode

I edited my answer, check to see if it works.

ADD REPLY
0
Entering edit mode

Hello

I am trying to use this code to delete specific columns in multiple csv files. I changed the .txt to .csv and also I changed my desired column names that I need to be omitted but it says that Error: Can't subset columns that don't exist. x Column Event doesn't exist. I checked with different column's names but I receive the same error each time. what should I do?

Thank you

ADD REPLY
0
Entering edit mode

Please post as a new question @ paramount.amin

ADD REPLY
0
Entering edit mode
4.3 years ago
Ali T. A. ▴ 30

You have solved the problem but you are binding the dataframes in the last call by producing finaldf. Therefore, you will get all data frames together. What you need to do is to write every data frame in the list to seperated files for example or work with them separately. So you will need something like dfList[[1]] to get the reduced contents of the first dataframe/file:

print(dfList[[1]])
ADD COMMENT

Login before adding your answer.

Traffic: 2141 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6