Question

Finding unique and identical rows between multiple columns of a data frame in R

1

Entering edit mode

2.3 years ago

salman_96 ▴ 70

Hi,

I am working on a large data-frame that has 4 columns and each column has variable rows. I am trying to find unique and identical pathways between the 4 columns (each column represent a particular day of treatment with a drug). Here below is a small example.

Pathways_1Day <- c("blood","kidney","testis","No","bone","liver","intestine","lungs","ABC","pancreas","Yes")

Pathways_2Day <- c("blood","kidney","testis","eyes","bone","cells","intestine","cervix","ABC","pancreas","None")

Pathways_3Day <- c("blood","kidney","vessels","lymph","t-cells","liver","intestine","lungs","ABC","epidermis","None")

df<-data.frame(Pathways_1Day,Pathways_2Day,Pathways_3Day)

I want to get a summary of the no of pathways that are common between different timepoints (1, 2 and 3 days).

Important: The no of pathways is not the same for each day.

I have tried this:

All_pathwayNames <- df%>%group_by_all%>%count

But the desired output is not what I am trying to get.

There can be different ways to address that. It will be great if I can get matching rows infront of each other across all columns.

Regards

identical unique rows R Dataframe • 2.1k views

ADD COMMENT • link written 2.3 years ago by salman_96 ▴ 70

0

Entering edit mode

What is the expected output for this data?

ADD REPLY • link 2.3 years ago by zx8754 12k

score 1 · Answer 1 · 2022-10-31

1

Entering edit mode

2.3 years ago

zx8754 12k

Use Reduce to interesect multiple vectors:

Reduce(intersect, list(Pathways_1Day, Pathways_2Day, Pathways_3Day))
# [1] "blood"     "kidney"    "intestine" "ABC"

Related StackOverflow post: How to find common elements from multiple vectors?

ADD COMMENT • link 2.3 years ago by zx8754 12k

score 1 · Answer 2 · 2022-10-31

1

Entering edit mode

2.3 years ago

Basti ★ 2.0k

Seems that you may need UpSet plot : https://github.com/hms-dbmi/UpSetR

ADD COMMENT • link 2.3 years ago by Basti ★ 2.0k

score 0 · Answer 3 · 2022-10-31

0

Entering edit mode

2.3 years ago

Trivas ★ 1.9k

Not the most elegant, but you can do something like this: Pathways_1Day[Pathways_1Day[Pathways_1Day %in% Pathways_2Day] %in% Pathways_3Day]

[1] "blood"  "kidney" "bone"   "liver"  "lungs"  "ABC"

ADD COMMENT • link 2.3 years ago by Trivas ★ 1.9k

score 0 · Answer 4 · 2022-10-31

As far as your example is concerned, this will order produce a sparse data.frame with matching rows infront of each other across all columns.

pathways_combined <- sort(unique(unlist(df)))

df2 <-
  as.data.frame(apply(df, 2, function(x, y) {
    y <- factor(y,levels=c(y,NA))
    y[!is.element(y, x)] <- NA
    return(y)
  }, y = pathways_combined))

But if your pathways are of different length in the first place, you can probably not start from a data.frame, but will need to lapply instead to loop over a list.