Entering edit mode
8.7 years ago
michealsmith
▴
800
I would like to test if genes containing at least one transcription factor (say MEF2A) binding sites are enriched for certain category.
I could easily come up with a TF-containing-gene list by intersecting TF binding sites bed files with gene annotation bed files, and send for enrichment study.
But question is: if one gene is big, naturally it tends to be more likely to contain TF binding sites. So should I first control gene size?
So I should normalize by assigning one parameter to each gene as: (overlap size)/(gene size) ? And then sort and select say the top 200 or 500?
Is the transcription factor more likely to be biologically relevant when bound to promoters? If that's true, you could just restrict the overlap to TSS+/- 1kb which would generate fragments of equal length.
No, the TF bind to everywhere, which are all biologically relevant.