I have a folder containing 15000 CSV files and I need to remove all NAs in all those files.
I have a folder containing 15000 CSV files and I need to remove all NAs in all those files.
if files do not have headers, try running the following command (take a back up of your files and create a directory by name "output", before you proceed):
$ parallel 'grep -v "NA" {} > output/new_{.}.csv' ::: *.csv
if files have headers, install tsv-utils and run following command:
$ parallel 'keep-header {} -- grep -v "NA" > output/new_{.}.csv' ::: *.csv
Probably this job more suitable for bash, but here is using R: find all files, keep only complete cases (remove any row that has NA), then output with renamed filename.
library(data.table)
for(i in list.files("path/to/files", pattern = ".*\\.csv", full.names = TRUE)){
d <- fread(i)
fwrite(d[ complete.cases(d), ], file = paste0(i, ".clean.csv"))
}
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Remove NAs meaning:
I need to remove rows with NAs
You have to write a script to iterate on each csv to drop rows that contains NA.
Do you want to remove any rows/columns with NA values, or replace NA with something? For this question you may want to include an example of what one of the files looks like.