Question

How to calculate overlap of peptides between different categories to create Ven diagram

0

Entering edit mode

5.8 years ago

ishackm ▴ 110

Hi all,

I have the following dataset:

  ï..TGEClass.known         TGEClass.uknown
1             GVVEVTHDLQK             GVVEVTHDLQK
2           LFYADHPFIFLVR           LFYADHPFIFLVR
3       SALQSINEWAAQTTDGK       SALQSINEWAAQTTDGK
4  AVLSAEQLRDEEVHAGLGELLR  AVLSAEQLRDEEVHAGLGELL

I would like to calculate please the number of peptides that are present in both categories and those that are not.

I have tried to use the Venn count function from limma but that only accepts numerical values:

a <- vennCounts(c3)
a
     hw hm hr Counts
[1,]  0  0  0    113
[2,]  0  0  1     18
[3,]  0  1  0      8
[4,]  0  1  1      8
[5,]  1  0  0     12
[6,]  1  0  1      8
[7,]  1  1  0     11
[8,]  1  1  1     22

How I can convert my peptide dataset like that dataset above so that I can make a Venn diagram. I have researched everywhere I can but still failed to find the solution.

I would really appreciate it if someone could help me solve this problem.

Many Thanks,

Ishack

ven diagram peptide venn count r • 5.1k views

ADD COMMENT • link updated 5.8 years ago by lieven.sterck 15k • written 5.8 years ago by ishackm ▴ 110

score 1 · Answer 1 · 2019-07-02

1

Entering edit mode

5.8 years ago

AK ★ 2.2k

Hi Ishack,

Try this:

df <-
  data.frame(
    TGEClass.known = c(
      "GVVEVTHDLQK",
      "LFYADHPFIFLVR",
      "SALQSINEWAAQTTDGK",
      "AVLSAEQLRDEEVHAGLGELLR"
    ),
    TGEClass.uknown = c(
      "GVVEVTHDLQK",
      "LFYADHPFIFLVR",
      "SALQSINEWAAQTTDGK",
      "AVLSAEQLRDEEVHAGLGELL"
    )
  )


# Present in both TGEClass.known and TGEClass.uknown
length(intersect(df$TGEClass.known, df$TGEClass.uknown))

# TGEClass.known only
length(setdiff(df$TGEClass.known, df$TGEClass.uknown))

# TGEClass.uknown only
length(setdiff(df$TGEClass.uknown, df$TGEClass.known))

ADD COMMENT • link 5.8 years ago by AK ★ 2.2k

0

Entering edit mode

Hi SMK, Thanks very much for your answer but how can I get a table like this automatically, it is quite long to do it manually?

hw hm hr Counts
[1,]  0  0  0    113
[2,]  0  0  1     18
[3,]  0  1  0      8
[4,]  0  1  1      8
[5,]  1  0  0     12
[6,]  1  0  1      8
[7,]  1  1  0     11
[8,]  1  1  1     22

ADD REPLY • link 5.8 years ago by ishackm ▴ 110

0

Entering edit mode

What are hw, hm, and hr?

ADD REPLY • link 5.8 years ago by AK ★ 2.2k

0

Entering edit mode

Sorry those are meant to say TGEClass.uknown and TGEClass known. Please ignore the hw, hm and hr, I want table like that for TGEClass known and TGEClass unknown

ADD REPLY • link 5.8 years ago by ishackm ▴ 110

0

Entering edit mode

Perhaps:

> df.venn <- data.frame(
+   TGEClass.known = c(1, 1, 0),
+   TGEClass.unknown = c(1, 0, 1),
+   Counts = c(length(
+     intersect(df$TGEClass.known, df$TGEClass.uknown)
+   ), length(
+     setdiff(df$TGEClass.known, df$TGEClass.uknown)
+   ), length(
+     setdiff(df$TGEClass.uknown, df$TGEClass.known)
+   ))
+ )
> df.venn
  TGEClass.known TGEClass.unknown Counts
1              1                1      3
2              1                0      1
3              0                1      1
> as.matrix(df.venn)
     TGEClass.known TGEClass.unknown Counts
[1,]              1                1      3
[2,]              1                0      1
[3,]              0                1      1

ADD REPLY • link 5.8 years ago by AK ★ 2.2k

0

Entering edit mode

Hi SMK thanks a lot thats what was look for. Just one final question if you don't mind.

I have a lot of data frames like the one above but each one has a different number of categories and also different categories, would it be possible to intersect and setdif between all the different columns automatically?

ADD REPLY • link 5.8 years ago by ishackm ▴ 110

0

Entering edit mode

Got an idea from the function: venn, here demonstrating 2 sets and 3 sets:

> library(gplots)
> # Two sets
> df1 <-
+   data.frame(
+     TGEClass.known = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     ),
+     TGEClass.uknown = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELL"
+     )
+   )
> venn.tab1 <- venn(as.list(df1), show.plot = FALSE)
> attr(venn.tab1, "intersections") <- NULL
> attr(venn.tab1, "class") <- NULL
> print(venn.tab1)
   num TGEClass.known TGEClass.uknown
00   0              0               0
01   1              0               1
10   1              1               0
11   3              1               1
> # Three sets
> df2 <-
+   data.frame(
+     TGEClass.set1 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     ),
+     TGEClass.set2 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELL"
+     ),
+     TGEClass.set3 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGKK",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     )
+   )
> venn.tab2 <- venn(as.list(df2), show.plot = FALSE)
> attr(venn.tab2, "intersections") <- NULL
> attr(venn.tab2, "class") <- NULL
> print(venn.tab2)
    num TGEClass.set1 TGEClass.set2 TGEClass.set3
000   0             0             0             0
001   1             0             0             1
010   1             0             1             0
011   0             0             1             1
100   0             1             0             0
101   1             1             0             1
110   1             1             1             0
111   2             1             1             1

ADD REPLY • link 5.8 years ago by AK ★ 2.2k

0

Entering edit mode

Hi SMK, Unfortunately, I found just now that I can't do a Venn diagram for more than 5 categories.

Can you help me create a df that looks like this please?

TGE-Class     Count
T1              1
T2              1
Both            6

Thanks very much

ADD REPLY • link 5.8 years ago by ishackm ▴ 110

1

Entering edit mode

> library(gplots)
> df <-
+   data.frame(
+     T1 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "SALQSINEWAAQTTDGLL",
+       "SALQSINEWAAQTTDGTT",
+       "SALQSINEWAAQTTDGQQ",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     ),
+     T2 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "SALQSINEWAAQTTDGLL",
+       "SALQSINEWAAQTTDGTT",
+       "SALQSINEWAAQTTDGQQ",
+       "AVLSAEQLRDEEVHAGLGELL"
+     )
+   )
> venn.tab <- venn(as.list(df), show.plot = FALSE)
> t(t(unlist(lapply(attr(venn.tab, "intersections"), length))))
      [,1]
T1       1
T2       1
T1:T2    6

ADD REPLY • link 5.8 years ago by AK ★ 2.2k

0

Entering edit mode

Hi SMK,

Thanks very much for your quick response, I have been trying all day to fix this. You are a life saver!

ADD REPLY • link 5.8 years ago by ishackm ▴ 110

0

Entering edit mode

Hi SMK, sorry for the lateness, is there a way to see the number of unique peptides from each category when there are blanks in columns, please?

the length code sees the blank cells as unique peptides, unfortunately.

ADD REPLY • link 5.7 years ago by ishackm ▴ 110

0

Entering edit mode

Hi ishackm,

You can remove the empty element in list before you use venn:

l <- as.list(df)
l <- lapply(l, function(x) { x[!x == ""] })
venn.tab <- venn(l, show.plot = FALSE)

ADD REPLY • link 5.7 years ago by AK ★ 2.2k

0

Entering edit mode

Hi SMK , thank you again for your quick response. Much Appreciated.

ADD REPLY • link 5.7 years ago by ishackm ▴ 110

0

Entering edit mode

Cool, glad it helps!

ADD REPLY • link 5.7 years ago by AK ★ 2.2k

score 1 · Answer 2 · 2019-07-02

1

Entering edit mode

5.8 years ago

zx8754 12k

Convert to TRUE/FALSE, then use limma venn counts:

# example data
df <-data.frame(
  TGEClass.known = c(
    "GVVEVTHDLQK",
    "LFYADHPFIFLVR",
    "SALQSINEWAAQTTDGK",
    "AVLSAEQLRDEEVHAGLGELLR"
  ),
  TGEClass.uknown = c(
    "GVVEVTHDLQK",
    "LFYADHPFIFLVR",
    "SALQSINEWAAQTTDGK",
    "AVLSAEQLRDEEVHAGLGELL"
  ), stringsAsFactors = FALSE
)

library(data.table)

x <- dcast(cbind(stack(as.list(df)), x = TRUE), 
           values ~ ind, 
           value.var = "x", 
           fill = FALSE)[, -1]    

limma::vennCounts(x)
#   TGEClass.known TGEClass.uknown Counts
# 1              0               0      0
# 2              0               1      1
# 3              1               0      1
# 4              1               1      3

limma::vennDiagram(x)

ADD COMMENT • link 5.8 years ago by zx8754 12k

0

Entering edit mode

Hi, I ran the code you gave me but it is giving me an error:

    df = read.csv("FN1.csv")
    FN1 = as.vector(df)



    library(data.table)

    x <- dcast(cbind(stack(as.list(FN1)), x = TRUE), 
               values ~ ind, 
               value.var = "x", 
               fill = FALSE)[, -1]    
    limma::vennCounts

(x)

Error in stack.default(as.list(FN1)) : 
  at least one vector element is required

What im I doing wrong here please?

ADD REPLY • link 5.8 years ago by ishackm ▴ 110

0

Entering edit mode

You need to share your example CSV: FN1.csv, so that we can reproduce the problem.

ADD REPLY • link 5.8 years ago by zx8754 12k

0

Entering edit mode

Sorry for the late reply,

this is the csv I am using:

T2  T3
QHDMGHMMR   QHDMGHMMR
RPGGEPSPEGTTGQSYNQYSQR  RPGGEPSPEGTTGQSYNQYSQR
KTDELPQLVTLPHPNLHGPEILDVPSTVQK  KTDELPQLVTLPHPNLHGPEILDVPSTVQK
HRPRPYPPNVGEEIQIGHIPR   HRPRPYPPNVGEEIQIGHIPR
QHDMGHMMR   QHDMGHMMR
DQCIVDDITYNVNDTFHK  DQCIVDDITYNVNDTFHK
YYRITYGETGGNSPVQEFTVPGSK    YYRITYGETGGNSPVQEFTVPGSK

The code:

test = read.csv("test.csv", stringsAsFactors = FALSE)


library(gplots)
# example data



library(data.table)

x <- dcast(cbind(stack(as.list(df2)), x = TRUE), 
           values ~ ind, 
           value.var = "x", 
           fill = FALSE)[, -1]    

limma::vennCounts(x)
limma::vennDiagram(x)

The error:

Aggregation function missing: defaulting to length
Error in vapply(indices, fun, .default) : values must be type 'logical',
 but FUN(X[[1]]) result is type 'integer'

How can I fix this please?

ADD REPLY • link 5.8 years ago by ishackm ▴ 110

0

Entering edit mode

Yes, as the your columns overlap fully TRUE/FALSE is not working, replace TRUE/FALSE with 1/0 in dcast, see below example:

# example data
df <-read.table(text = "
T2  T3
QHDMGHMMR   QHDMGHMMR
RPGGEPSPEGTTGQSYNQYSQR  RPGGEPSPEGTTGQSYNQYSQR
KTDELPQLVTLPHPNLHGPEILDVPSTVQK  KTDELPQLVTLPHPNLHGPEILDVPSTVQK
HRPRPYPPNVGEEIQIGHIPR   HRPRPYPPNVGEEIQIGHIPR
QHDMGHMMR   QHDMGHMMR
DQCIVDDITYNVNDTFHK  DQCIVDDITYNVNDTFHK
YYRITYGETGGNSPVQEFTVPGSK    YYRITYGETGGNSPVQEFTVPGSK", stringsAsFactors = FALSE, header = TRUE)

library(data.table)

x <- dcast(cbind(stack(as.list(df)), x = 1), 
           values ~ ind, 
           value.var = "x", 
           fill = 0)[, -1]

limma::vennCounts(x)

#   T2 T3 Counts
# 1  0  0      0
# 2  0  1      0
# 3  1  0      0
# 4  1  1      6
# attr(,"class")
# [1] "VennCounts"

ADD REPLY • link 5.8 years ago by zx8754 12k

0

Entering edit mode

Thanks very much for your quick response

ADD REPLY • link 5.8 years ago by ishackm ▴ 110

0

Entering edit mode

Hi, Unfortunately, I found just now that I can't do a Venn diagram for more than 5 categories.

Can you help me create a df that looks like this please?

TGE-Class     Count
T1              1
T2              1
Both            6

Thanks very much

ADD REPLY • link 5.8 years ago by ishackm ▴ 110

score 0 · Answer 3 · 2019-07-02

0

Entering edit mode

5.8 years ago

lieven.sterck 15k

if you are looking for exact mactches (so no peptide can be subset of another) you can use your lists as such as input for DrawVenn . It's an online tool for drawing venn diagrams