I'm new to R. My question is I have 2 gene lists, (one with unregulated genes, another one is process with GO ids and genes) and I'm wondering how to generate a venn diagram and see which genes that are overlap?
I'm new to R. My question is I have 2 gene lists, (one with unregulated genes, another one is process with GO ids and genes) and I'm wondering how to generate a venn diagram and see which genes that are overlap?
You don't need R, just two lists of genes, one list for unregulated and one list for your other category. You can paste each set into BioVenn to make a roughly proportional circular Venn diagram. If you want a list of overlaps, just use grep
between the two lists:
$ grep -fF listA listB > answer
Here are three simple methods using R. Thomas Girke has a lot of useful R stuff. FInd it and use it.
# generate two random lists
a <- sample(LETTERS,20)
b <- sample(LETTERS,20)
# grab Thomas Girke's Venn Diagram functions
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/vennDia.R")
# draw a diagram
qlist <- venndiagram(x=a, y=b, unique=T, title="2-Way Venn Diagram", labels=c("Sample1", "Sample2")
,plot=T, lines=c(2,3), lcol=c(2,3), lwd=3, cex=1.3, type="2")
You'll find that qlist is a list containing the elements of each Venn diagram region. e.g.
> qlist$q1
[1] "M" "J" "F" "U" "O" "K" "S" "T" "G" "Q" "Y" "Z" "L" "W" "A"
However, if you just need simple counts, you can use the %in%
operator:
# count the common elements of a and b
sum(a %in% b)
# get the elements themselves
a[a %in% b]
# or use other functions for sets
intersect(a,b)
length(intersect(a,b))
Note: if you examine the code at the URL above (just load it into your browser) he says he has replaced that Venn Diagram function with something better. Might be worth examining.
R package: VennDiagram
Check this Biostar post for more info.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I have more than 1000000 genes in one gene list, and the tools don't allow me to perform that much. What should I do?
Do element counts of each set category:
Turn these counts into smaller values and make dummy lists.
For instance, say you have 1M elements unique to set A, 500k elements unique to set B, and 300k elements unique to A ∩ B.
Divide all the element count values by 10k. Then you have 100 new "elements" in set A, 50 new "elements" in set B, and 30 "elements" in set A ∩ B.
Now make three new lists with these element counts.
If you want to do element counts, the simplest way is to use the command-line tool
wc
to do line counts, as in:Can you clarify more? how do I perform in R?
And BioVenn will also give a list of the overlap. Just click on one of the links below the diagram (e.g. x-y total overlap).