What is the recommended (hopefully free) tool for finding enrichment of transcription factor binding sites in a set of promoter sequences?
What is the recommended (hopefully free) tool for finding enrichment of transcription factor binding sites in a set of promoter sequences?
Check out the data and tools in the jasper database (Free)
Depends on what you want to do of course--but you might find some tools in the MEME suite that could help you: http://meme.sdsc.edu/meme/
A grad student here had this very question at the beginning of her thesis. Like others here, we used TRANSFAC motifs. I would do that again adding JASPAR to the mix. At that time, no tools were known to her. We found two important considerations:
What defines the "promoter" or "gene control region" in human? We settled on 5000 bp of upstream sequence + exon 1 + intron 1 (entire or up to first 1000 bp, can't recall). Why intron 1? Because many gene control elements are found here.
When looking for enrichment, how do you define your set of control genes? By size (given that we took exon 1 and intron 1 data)? By GO categories? By gene position (say the neighboring gene)? This was tough and your solution may be specific to the genes your examining or the question(s) you are after.
The student then ran MAPPER to identify the TRANSFAC motifs.
The PAINT promoter analysis tool is my personal favorite. It will take a list of genes, find the upstream regions automatically, pass them through the free version of TRANSFAC and then compare the enrichment to a background set of genes ... either user provided or from a built-in choice. Everything is quite automated and very customizable.
You could try PSCAN http://159.149.109.9/pscan/ it uses TFBS from both TRANSFAC and JASPAR.
For people working with plants the ELEMENT could be useful.
It searches not only for known motifs but also for enriched words.
I've found clover http://cagt.bu.edu/page/Clover_about quite usefull. But you have to provide matirces for TF-binding sites. It is also long not updated program.
I would suggest you to may customize your favorite GO enrichment tool in a way that the background list of genes will only represent the TFs or genes with TF related terms and perform the enrichment calculation. I tried this one for a small analysis.
Other option is to use a published method like Modulator inference by network dynamics (MINDy) . Disclaimer: I have not tried MINDy myself.
I'd second MEME as a conservative approach. Try to see which patterns are stable over a range of promoter sizes, promoter subsets and cutoffs. Once you have those switch to TOMTOM (part of the MEME suite) to map it to JASPER or TRANSFAC matrices.
That's actually one major difference between the various tools -- Dave, do you have a list of genes or promoter sequences? Many tools expect a list of genes because they have their own concept of what a promoter is. If you have CAGE or RNA-Seq data and would like to define which promoter you are interested in half of the existing systems won't be of use to you. Likewise, if you are working with a species not supported by the system you'd be out of luck.
In that case a number of systems such as CisRed won't make sense to use as they have their own, fixed definition of start sites and promoters. The second question would be just how many promoters do you have. If it's just a few you probably have to revert to systems that use phylogenetic footprinting to increase your chances of finding functional binding sites.
Maybe RSAT? It seems to have a fairly broad collection of useful motif and CRM building and scanning tools, although I haven't used them myself yet so I can't tell you anything much about them. Web site/services are free but I think you have to register by post(?!) to install tools locally. http://rsat.ulb.ac.be/rsat/
You can find the enriched transcription factor binding sites in a set of promoter sequences for plants using the tool provided by PlantRegMap.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
unfortunately not much for nematode