dplyr and %>% operator
I highly recommend you to look through 'dplyr' package as well as other tools from 'tidyverse'. dplyr is extremely powerful tool for cleaning and summarizing the data. It provides special '%>%' operator which pipes the output of one function into another. I can provide the following illustration to this operator:
y <- F(G(H(x)))
x %>%
H() %>%
G() %>%
F() -> y
Both approaches give absolutely the same result but piping makes the code syntax closer to the human way of thinking: "I take x, put it in H function, then the result goes to the G function, afterwards we apply F function and finally we put the value to the y variable".
%>% operator is defined in 'magrittr' package but it is extremely powerful for dataframe operations defined in 'dplyr'.
Go back to your question about the solution for columns. Here is the code:
library(dplyr)
onlyNAcolumns_idx <- data %>%
is.na() %>%
apply(MARGIN = 2, FUN = all)
data[,!onlyNAcolumns_idx]
How to read the code which calculates onlyNAcolumns_idx:
- We take 'data' object
- Then we apply is.na() function. The result is the data.frame object of the same size as 'data'; it contains TRUEs and FALSEs. You have TRUEs for NA values in original 'data'.
- We apply the function 'all()' for every column. This function returns TRUE if it is applied to vector with all TRUEs
That's it! the length of 'onlyNAcolumns_idx' is the same as the number of columns in data:
length(onlyNAcolumns_idx) == ncol(data)
Finally, you just make logic subsetting of the dataframe.
When you get in touch with dplyr the R-life will become easier. One just needs some time to adapt.
Example
x <- matrix(seq(25), ncol = 5)
x[2,] <- NA
x[,4] <- NA
x[4,2] <- NA
x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 NA 21
[2,] NA NA NA NA NA
[3,] 3 8 13 NA 23
[4,] 4 NA 14 NA 24
[5,] 5 10 15 NA 25
onlyNAcolumns_idx <- x %>%
is.na() %>%
apply(MARGIN = 2, FUN = all)
onlyNAcolumns_idx
[1] FALSE FALSE FALSE TRUE FALSE
( y <- x[,!onlyNAcolumns_idx] )
[,1] [,2] [,3] [,4]
[1,] 1 6 11 21
[2,] NA NA NA NA
[3,] 3 8 13 23
[4,] 4 NA 14 24
[5,] 5 10 15 25
onlyNArows_idx <- y %>%
is.na() %>%
apply(MARGIN = 1, FUN = all)
onlyNArows_idx
[1] FALSE TRUE FALSE FALSE FALSE
y[!onlyNArows_idx,]
[,1] [,2] [,3] [,4]
[1,] 1 6 11 21
[2,] 3 8 13 23
[3,] 4 NA 14 24
[4,] 5 10 15 25
I suggest adding the commands you've tried. For columns confront link and for rows link
In fact it was not so hard to find ;-)
Hi Maciej,
Thank you for your prompt reply. Indeed, not difficult to find but I've tried all the suggestions I could find from the forums. The commands you just sent me, na.omit, na.rm = TRUE, x[complete.cases(x), ] and many more which I didn't save because they didn't provide me with the desired result.
Thanks again!