Question

How to remove NAs in multiple CSV files in a folder

0

Entering edit mode

4.7 years ago

paramount.amin • 0

I have a folder containing 15000 CSV files and I need to remove all NAs in all those files.

R • 2.3k views

ADD COMMENT • link updated 4.7 years ago by zx8754 12k • written 4.7 years ago by paramount.amin • 0

1

Entering edit mode

Remove NAs meaning:

Replace NAs with "" ?
Remove rows with NAs?

ADD REPLY • link 4.7 years ago by cpad0112 21k

0

Entering edit mode

I need to remove rows with NAs

ADD REPLY • link 4.7 years ago by paramount.amin • 0

0

Entering edit mode

You have to write a script to iterate on each csv to drop rows that contains NA.

ADD REPLY • link 4.7 years ago by shoujun.gu ▴ 380

0

Entering edit mode

Do you want to remove any rows/columns with NA values, or replace NA with something? For this question you may want to include an example of what one of the files looks like.

ADD REPLY • link 4.7 years ago by rpolicastro 13k

score 1 · Answer 1 · 2020-09-03

if files do not have headers, try running the following command (take a back up of your files and create a directory by name "output", before you proceed):

$ parallel  'grep -v "NA" {}  > output/new_{.}.csv' ::: *.csv

if files have headers, install tsv-utils and run following command:

$ parallel  'keep-header {} -- grep -v "NA" > output/new_{.}.csv' ::: *.csv

score 0 · Answer 2 · 2020-09-03

0

Entering edit mode

4.7 years ago

zx8754 12k

Probably this job more suitable for bash, but here is using R: find all files, keep only complete cases (remove any row that has NA), then output with renamed filename.

library(data.table)

for(i in list.files("path/to/files", pattern = ".*\\.csv", full.names = TRUE)){
  d <- fread(i)
  fwrite(d[ complete.cases(d), ], file = paste0(i, ".clean.csv"))
}

ADD COMMENT • link 4.7 years ago by zx8754 12k

0

Entering edit mode

R uses POSIX extended regex, so * means 0 or more times, and . means any character except newline. Your regex as written should most likely be .*\\.csv, however, csv$ should be sufficient.

ADD REPLY • link 4.7 years ago by rpolicastro 13k

0

Entering edit mode

By far I am not a regex expert, but "*.csv" works fine for me. In any case, OP can test and amend as needed.

ADD REPLY • link 4.7 years ago by zx8754 12k

1

Entering edit mode

It will work, but not as intended or expected. It's better to use properly formatted regular expressions to avoid capturing Mechanicsville, Virginia.

ADD REPLY • link 4.7 years ago by rpolicastro 13k

0

Entering edit mode

OK, convinced :) updated the post, thank you.

ADD REPLY • link 4.7 years ago by zx8754 12k

0

Entering edit mode

Take it from someone whose had all kinds of strange unintended regex captures that it's worth it to make sure it's properly formatted, haha.