Entering edit mode
19 months ago
Pac314
▴
10
Is it appropriate to take a three-way intersection of three overlapping gene sets and use the phyper
function in R to assess the significance of the overlap? I have used the hypergeometric distribution for pairwise intersections before but not for > 2 intersections.
You're better off just simulating it. For 1000+ iterations randomly sample 3 gene sets of equal size to your original 3 gene sets and record the overlap length. Your observed overlap should be greater than 95% of these simulated overlaps.
+1 for the simulation mentioned by rpolicastro , and it its the typical approach. However also check out the SuperExactTest R package, which implements testing for intersections of multiple sets (it is also nice to analize all possible intersections in one go).
Thank you both for your answers! This library is perfect for my analysis. Thank you for sharing this!