1) I am confused about how to look for enrichment of my data. Specifically I have attempted to use phyper but I seem to misunderstand how to use the phyper function when it pertains to 'lists' of overlaps for three files. My data looks like the following:
- list a - 5,627 #list of unique peaks for a transcription factor
- list b - 2,533 #list of unique peaks for a transcription factor
- list c - 3,989 #list of unique peaks for a transcription factor
The number of common peaks (overlap) within all three files (eg: the center of a venn diagram) is 2,329.
The total number of unique peaks possible for all three transcription factors (if they perfectly overlapped the reference data) is 8,669.
2) This data pertains to intergenic regions. I am also interested in how to test whether these transcription factors are more enriched in intergenic regions versus intragenic regions assuming I have similar data for intragenic regions.
I might simply be misunderstanding the use of the phyper function and require something completely different. Essentially I am trying to see whether these three transcription factors are enriched in the reference data region in the form of a p-value or similar.
I've begun messing around with the fisher function in bedtools. Could this be used to achieve what I'm looking for? If so, how?
Thank you,
Carlos
Thanks for the comment Ian. However, your link leads me to a blank page with a "This link may not be followed from within Galaxy" error. Though I don't ever use Galaxy.
EDIT: Just went ahead and googled it, I'll give it a shot and then let you know what I find in another comment! Thank you!
While I haven't fully gone through the Genomic Hyperbrowser option. I was wondering if it would be possible to take my bed file of 'complex' overlaps and then run it against a bed annotation file to test if the complex is over-represented in the genome?