i)

Question

Display common genes in sets by a binary matrix

0

Entering edit mode

6.9 years ago

lessismore ★ 1.4k

hey all,

i have say 5 set of genes (each set with genes separated by \n) what i want is to create a binary matrix to display for each gene its presence (with 1) or absence (with 0) in each set. Do you have any advice to do it with R? my final purpose is to submit this matrix to upsetR.

thanks in advance

r matrix • 2.7k views

ADD COMMENT • link updated 6.9 years ago by russhh 5.7k • written 6.9 years ago by lessismore ★ 1.4k

score 1 · Answer 1 · 2018-02-15

1

Entering edit mode

6.9 years ago

russhh 5.7k

Use upsetR::fromList

library(UpSetR)
my_genes <- list(set1 = letters[1:5], set2 = letters[3:7], set3 = letters[7:12])
fromList(my_genes)

   set1 set2 set3
1     1    0    0
2     1    0    0
3     1    1    0
4     1    1    0
5     1    1    0
6     0    1    0
7     0    1    1
8     0    0    1
9     0    0    1
10    0    0    1
11    0    0    1
12    0    0    1

I put something on my blog a while ago

ADD COMMENT • link 6.9 years ago by russhh 5.7k

0

Entering edit mode

Hello, given your expertise in UpsetR, do you maybe know how could i store the name of the objects in the rownames of this table your showed? Referring to this specific example i want instead of the 1 to 12 rownames the real rownames of the objects in the intersections (that in this case are the letters)

ADD REPLY • link 6.8 years ago by lessismore ★ 1.4k

1

Entering edit mode

that information isn't used by UpsetR - it only considers the number of elements in the various overlaps. But it's a straightforward exercise in R:

i) first generate a vector my_genes of unique gene-ids,

ii) then iterate over your list of genesets using Map, purrr:map or lapply, indicating for each gene in my_genes whether it is present in the current geneset

iii) then convert the returned list into a data-frame

iv) then add my_genes to the rownames of that data-frame (or preferably, put it in the body of the data-frame rather than the rownames)

ADD REPLY • link 6.8 years ago by russhh 5.7k

0

Entering edit mode

many thanks, i did the point 1. then i want (point 2) to iterate with lapply(my_genes, is.element) i have no idea how to write the function for indicating to use the vector i created

ADD REPLY • link 6.8 years ago by lessismore ★ 1.4k

0

Entering edit mode

you'll have to write a function to do that. However, i meant for you to iterate over the sets, rather than over the elements in the union of those sets:

my_sets <- list(set1 = c(...),..., setk=c(...))

i)

my_genes <- unique(unlist(...))

ii)

lapply(my_sets, function(s) my_genes %in% s)

iii) solve the rest yourself

ADD REPLY • link 6.8 years ago by russhh 5.7k

0

Entering edit mode

thanks man, this was what i wanted. thats my "rude" solution, if you know how to write it better, please advice me :)

my_genes <- unique(unlist(my_sets))
my_function <- function(x){
  is.element(my_genes,x)
}

lapply(my_sets, my_function)
test <- as.data.frame(lapply(my_sets, my_function))
rownames(test) <- my_genes
test[test=="TRUE"]<- 1

ADD REPLY • link 6.8 years ago by lessismore ★ 1.4k