Heys,
I have a big dataset (37Gb) with a lot of missing data as NA. I know how to remove NAs once the file is read, but I was wondering if there is any way to remove the NAs while (or without) reading the file, so the read file is already without NAs. If not I have to read all the file and as you can imagine I will need a lot of RAM.
Thanks in advance!!
You don't need to read the entire file into memory to remove the NA's. You can read and write the file line by line to remove the NA, using python or R. You may be able to edit in place with
sed
.