Question

Making sublist of 26 gene lists

0

Entering edit mode

6.7 years ago

zizigolu ★ 4.3k

Hi, I have 26 lists of genes. I want to extract an overlap of 26 lists for clustering. something like below

> head(inc[,1:4])
     31439_f_at 31440_at 31441_at 31442_at
set1          1        1        1        1
set2          0        0        0        0
set3          0        0        0        0
>

I tried ‘GSEABase’, but too complicated. May someone help me please?

Ideally, a matrix with "gene lists" as the columns (ie., gene list 1 in column 1, gene list 2 in column 2, etc.) and rows with the union of all genes. Put a "1" in each cell for a gene that is present in a gene list and "0" elsewhere.

R similarity clustering • 1.7k views

ADD COMMENT • link 6.7 years ago by zizigolu ★ 4.3k

score 2 · Accepted Answer · 2018-03-24

2

Entering edit mode

6.7 years ago

russhh 5.7k

A related question was asked the other day.

gene_lists = list(letters[1:3], letters[3:7], letters[6:8])

There's loads of ways to do this

The solution that @lessismore generated in the comments was effectively:

make_bipartite_adjacency_from_sets <- function(list_of_sets){
        universe <- sort(unique(unlist(list_of_sets)))
        adjacency_df <- lapply(list_of_sets, function(x) as.numeric(universe %in% x)) %>% as.data.frame()
        rownames(adjacency_df) <- universe
        adjacency_df
    }

make_bipartite_adjacency_from_sets(gene_lists)

  G1 G2 G3
a  1  0  0
b  1  0  0
c  1  1  0
d  0  1  0
e  0  1  0
f  0  1  1
g  0  1  1
h  0  0  1

You could also do a tidyverse version (but this disallows row names):

make_bipartite_adjacency_from_sets2 <- function(list_of_sets){
    list_of_sets %>%
        purrr::map(function(x) tibble::data_frame(gene_id = x, adj = 1)) %>%
        dplyr::bind_rows(.id = "set_id") %>%
        tidyr::spread(key = set_id, value = adj, fill = 0)
}

 make_bipartite_adjacency_from_sets2(gene_lists)
# A tibble: 8 x 4
  gene_id    G1    G2    G3
*   <chr> <dbl> <dbl> <dbl>
1       a     1     0     0
2       b     1     0     0
3       c     1     1     0
4       d     0     1     0
5       e     0     1     0
6       f     0     1     1
7       g     0     1     1
8       h     0     0     1

ADD COMMENT • link 6.7 years ago by russhh 5.7k

0

Entering edit mode

Thank you, the problem is I can't figure out how make a gene list. For instance, how make gene list or gene set1 by a column of genes?

I made GS1 like this screenshot

https://ibb.co/khxeKn

but says

> make_bipartite_adjacency_from_sets(gene_lists)
Error in attributes(.Data) <- c(attributes(.Data), attrib) : 
  'names' attribute [1704] must be the same length as the vector [2]
Called from: structure(res, levels = lv, names = nm, class = "factor")
Browse[1]>

ADD REPLY • link 6.7 years ago by zizigolu ★ 4.3k

1

Entering edit mode

can't you just put all your GS* vectors into a named list?

ADD REPLY • link 6.7 years ago by russhh 5.7k

0

Entering edit mode

Thanks a lot, without your help definitely I could not figure out for at least 2 weeks...

I did so;

    library(igraph)

    GS1=c(t(GS1))

    GS2=c(t(GS2))

    gene_lists = list(c(GS1), c(GS2))

make_bipartite_adjacency_from_sets <- function(list_of_sets){
        universe <- sort(unique(unlist(list_of_sets)))
        adjacency_df <- lapply(list_of_sets, function(x) as.numeric(universe %in% x)) %>% as.data.frame()
        rownames(adjacency_df) <- universe
        adjacency_df
    }

make_bipartite_adjacency_from_sets(gene_lists)

ADD REPLY • link 6.7 years ago by zizigolu ★ 4.3k