Subsetting/filtering a tibble by a combination of data stored in a second tibble
1
0
Entering edit mode
3.0 years ago
Mathias_H ▴ 20

Hello everyone, I am looking for the easiest way to subset a tibble by data from another tibble. Let me explain:

> Tibble_1
# A tibble: 10 x 5
   Mushroom M_Nr  color  size surface
   <chr>    <chr> <chr> <dbl> <chr>  
 1 K570     12    brown     7 ABC    
 2 K570     18    brown     9 CDF    
 3 K570     33    brown    10 FDA    
 4 K830     1     brown    14 BCA    
 5 K830     23    brown    16 BBF    
 6 K830     44    brown    15 FCA    
 7 K830     45    brown    17 ABC    
 8 K830     48    brown    14 CDF    
 9 K480     7     brown    14 FDA    
10 K480     34    brown     9 CDF 

> Tibble_2
# A tibble: 10 x 2
    size surface
   <dbl> <chr>  
 1    24 ABC    
 2     9 CDF    
 3     8 CDF    
 4    12 FDA    
 5    13 FDA    
 6    15 FDA    
 7    17 ABC    
 8    12 FCA    
 9    14 FCA    
10     9 CDF   

I want to filter Tibble_1 to keep only the rows that have a combination of size and surface, that cannot be found in Tibble_2. For example, I want to keep row 1 of Tibble_1, because there is no combination of size = 7 and surface = ABC in Tibble_2. On the other hand, I do not want to keep row 2 of Tibble_1 because there is a row in Tibble_2 with the same combination of size and surface (in this case also row 2). I cannot filter for size and surface consecutively, because then I would lose some rows that I actually want to keep. (E.g. if I filter for size first, I would lose row 9 in Tibble_1, even though there is no size = 14/surface = FDA in Tibble_2)

I am mostly working with tidyverse packages, so I would be happy to find a solution within the functions of these packages. Can someone help me?

Thanks a lot!

tidyverse tibble R dplyr • 1.3k views
ADD COMMENT
0
Entering edit mode

Just make a new column and filter rows based on it.

ADD REPLY
1
Entering edit mode
3.0 years ago
4galaxy77 2.9k

This seems like a job for dplyr::anti_join. I've not checked the output though, you might want to make sure it's what you want.

library(dplyr)
anti_join(t1, t2)
Joining, by = c("size", "surface")
# A tibble: 7 × 5
  V2       V3 V4     size surface
  <chr> <int> <chr> <int> <chr>  
1 K570     12 brown     7 ABC    
2 K570     33 brown    10 FDA    
3 K830      1 brown    14 BCA    
4 K830     23 brown    16 BBF    
5 K830     44 brown    15 FCA    
6 K830     48 brown    14 CDF    
7 K480      7 brown    14 FDA

Or, a slightly less elegant version:

t1 %>% 
    mutate(comb = paste(size, surface, sep="_")) %>% 
    filter(!comb %in% paste(t2$size, t2$surface, sep="_"))
ADD COMMENT
0
Entering edit mode

Thank you! This is exactly what I was looking for! Didn't know about the anti_join() function.

ADD REPLY

Login before adding your answer.

Traffic: 1522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6