Entering edit mode
6.8 years ago
lessismore
★
1.4k
hey all,
i have say 5 set of genes (each set with genes separated by \n) what i want is to create a binary matrix to display for each gene its presence (with 1) or absence (with 0) in each set. Do you have any advice to do it with R? my final purpose is to submit this matrix to upsetR.
thanks in advance
Hello, given your expertise in UpsetR, do you maybe know how could i store the name of the objects in the rownames of this table your showed? Referring to this specific example i want instead of the 1 to 12 rownames the real rownames of the objects in the intersections (that in this case are the letters)
that information isn't used by UpsetR - it only considers the number of elements in the various overlaps. But it's a straightforward exercise in R:
i) first generate a vector my_genes of unique gene-ids,
ii) then iterate over your list of genesets using Map, purrr:map or lapply, indicating for each gene in my_genes whether it is present in the current geneset
iii) then convert the returned list into a data-frame
iv) then add my_genes to the rownames of that data-frame (or preferably, put it in the body of the data-frame rather than the rownames)
many thanks, i did the point 1. then i want (point 2) to iterate with
lapply(my_genes, is.element)
i have no idea how to write the function for indicating to use the vector i createdyou'll have to write a function to do that. However, i meant for you to iterate over the sets, rather than over the elements in the union of those sets:
my_sets <- list(set1 = c(...),..., setk=c(...))
i)
my_genes <- unique(unlist(...))
ii)
lapply(my_sets, function(s) my_genes %in% s)
iii) solve the rest yourself
thanks man, this was what i wanted. thats my "rude" solution, if you know how to write it better, please advice me :)