I would like to you to suggest me databases of annotated regulatory sequences, such as transcription factor binding sites, in a given promoter. The tool, possibly, should be based on literature. An example of the desired output could be the sequence of nanog promoter in mouse with the known binding sites and other regulatory elements somehow marked. Databases of sole regulatory elements are also welcome.
In Ensembl we do a so-called "regulatory build" for human and mouse, based on genome-wide ChIP-Seq data, that results in a set of "best guess" regulatory elements. For detailed information please have a look here.
The results of this regulatory build are shown on the "Regulation" view under the Gene tab (example) or can be added as tracks to the "Region in detail" view under the Location tab using the [Configure this page] option.
There are two main open-access database of regulatory sequence information curated from the literature: ORegAnno and Pazar. These project have lost some momentum with the advent of chip-seq, but there are now efforts to curate chip-seq data from the literature such as hmChip and the Ensembl regulatory build mentioned by Bert.
You might take a look at the UCSC Genome Browser with several of the gene regulatory tracks turned on. Take a look here for an example. If you scroll down to the bottom of the page, you will see the tracks that are turned on. The amount of data for human is much larger, but DNAseI hypersensitivity is a very good general marker of open chromatin and regulatory potential of a genomic location in a particular cellular context.
The problem with UCSC, in my opinion, is that when you try to download regulatory sequences from several genomes by scripting you have to solve the great variability in the different fields between different species. Is there a way to overcome this?
See related posts: