Making sublist of 26 gene lists
1
0
Entering edit mode
6.7 years ago
zizigolu ★ 4.3k

Hi, I have 26 lists of genes. I want to extract an overlap of 26 lists for clustering. something like below

> head(inc[,1:4])
     31439_f_at 31440_at 31441_at 31442_at
set1          1        1        1        1
set2          0        0        0        0
set3          0        0        0        0
>

I tried ‘GSEABase’, but too complicated. May someone help me please?

Ideally, a matrix with "gene lists" as the columns (ie., gene list 1 in column 1, gene list 2 in column 2, etc.) and rows with the union of all genes. Put a "1" in each cell for a gene that is present in a gene list and "0" elsewhere.

R similarity clustering • 1.7k views
ADD COMMENT
2
Entering edit mode
6.7 years ago
russhh 5.7k

A related question was asked the other day.

gene_lists = list(letters[1:3], letters[3:7], letters[6:8])

There's loads of ways to do this

The solution that @lessismore generated in the comments was effectively:

make_bipartite_adjacency_from_sets <- function(list_of_sets){
        universe <- sort(unique(unlist(list_of_sets)))
        adjacency_df <- lapply(list_of_sets, function(x) as.numeric(universe %in% x)) %>% as.data.frame()
        rownames(adjacency_df) <- universe
        adjacency_df
    }

make_bipartite_adjacency_from_sets(gene_lists)

  G1 G2 G3
a  1  0  0
b  1  0  0
c  1  1  0
d  0  1  0
e  0  1  0
f  0  1  1
g  0  1  1
h  0  0  1

You could also do a tidyverse version (but this disallows row names):

make_bipartite_adjacency_from_sets2 <- function(list_of_sets){
    list_of_sets %>%
        purrr::map(function(x) tibble::data_frame(gene_id = x, adj = 1)) %>%
        dplyr::bind_rows(.id = "set_id") %>%
        tidyr::spread(key = set_id, value = adj, fill = 0)
}

 make_bipartite_adjacency_from_sets2(gene_lists)
# A tibble: 8 x 4
  gene_id    G1    G2    G3
*   <chr> <dbl> <dbl> <dbl>
1       a     1     0     0
2       b     1     0     0
3       c     1     1     0
4       d     0     1     0
5       e     0     1     0
6       f     0     1     1
7       g     0     1     1
8       h     0     0     1
ADD COMMENT
0
Entering edit mode

Thank you, the problem is I can't figure out how make a gene list. For instance, how make gene list or gene set1 by a column of genes?

I made GS1 like this screenshot

https://ibb.co/khxeKn

but says

> make_bipartite_adjacency_from_sets(gene_lists)
Error in attributes(.Data) <- c(attributes(.Data), attrib) : 
  'names' attribute [1704] must be the same length as the vector [2]
Called from: structure(res, levels = lv, names = nm, class = "factor")
Browse[1]>
ADD REPLY
1
Entering edit mode

can't you just put all your GS* vectors into a named list?

ADD REPLY
0
Entering edit mode

Thanks a lot, without your help definitely I could not figure out for at least 2 weeks...

I did so;

    library(igraph)

    GS1=c(t(GS1))

    GS2=c(t(GS2))

    gene_lists = list(c(GS1), c(GS2))

make_bipartite_adjacency_from_sets <- function(list_of_sets){
        universe <- sort(unique(unlist(list_of_sets)))
        adjacency_df <- lapply(list_of_sets, function(x) as.numeric(universe %in% x)) %>% as.data.frame()
        rownames(adjacency_df) <- universe
        adjacency_df
    }

make_bipartite_adjacency_from_sets(gene_lists)
ADD REPLY

Login before adding your answer.

Traffic: 2276 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6