Remove NA from file while reading
2
1
Entering edit mode
2.2 years ago
gubrins ▴ 350

Heys,

I have a big dataset (37Gb) with a lot of missing data as NA. I know how to remove NAs once the file is read, but I was wondering if there is any way to remove the NAs while (or without) reading the file, so the read file is already without NAs. If not I have to read all the file and as you can imagine I will need a lot of RAM.

Thanks in advance!!

python R • 1.2k views
ADD COMMENT
0
Entering edit mode

You don't need to read the entire file into memory to remove the NA's. You can read and write the file line by line to remove the NA, using python or R. You may be able to edit in place with sed.

ADD REPLY
2
Entering edit mode
2.2 years ago

Assuming you want a solution in R, I would do something along these lines using data.table::fread:

library(data.table)

dat <- fread(cmd="grep -v -w 'NA' bigfile.txt")

of course you would need to customize the shell command cmd to your case but hopefully you get the idea.

ADD COMMENT
0
Entering edit mode
2.2 years ago

Use a stream-oriented approach to process your data. For example readr with R:

https://readr.tidyverse.org/reference/read_lines.html

ADD COMMENT

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6