Hello everyone, I am looking for the easiest way to subset a tibble by data from another tibble. Let me explain:
> Tibble_1
# A tibble: 10 x 5
Mushroom M_Nr color size surface
<chr> <chr> <chr> <dbl> <chr>
1 K570 12 brown 7 ABC
2 K570 18 brown 9 CDF
3 K570 33 brown 10 FDA
4 K830 1 brown 14 BCA
5 K830 23 brown 16 BBF
6 K830 44 brown 15 FCA
7 K830 45 brown 17 ABC
8 K830 48 brown 14 CDF
9 K480 7 brown 14 FDA
10 K480 34 brown 9 CDF
> Tibble_2
# A tibble: 10 x 2
size surface
<dbl> <chr>
1 24 ABC
2 9 CDF
3 8 CDF
4 12 FDA
5 13 FDA
6 15 FDA
7 17 ABC
8 12 FCA
9 14 FCA
10 9 CDF
I want to filter Tibble_1 to keep only the rows that have a combination of size and surface, that cannot be found in Tibble_2. For example, I want to keep row 1 of Tibble_1, because there is no combination of size = 7 and surface = ABC in Tibble_2. On the other hand, I do not want to keep row 2 of Tibble_1 because there is a row in Tibble_2 with the same combination of size and surface (in this case also row 2). I cannot filter for size and surface consecutively, because then I would lose some rows that I actually want to keep. (E.g. if I filter for size first, I would lose row 9 in Tibble_1, even though there is no size = 14/surface = FDA in Tibble_2)
I am mostly working with tidyverse packages, so I would be happy to find a solution within the functions of these packages. Can someone help me?
Thanks a lot!
Just make a new column and filter rows based on it.