Hi,
Is anyone aware of a database of DNA binding proteins?
D.
Hi,
Is anyone aware of a database of DNA binding proteins?
D.
DNA binding is identified by the Gene Ontology annotations GO:0003677 Many protein databases have GO annotations that you can filter by.
Swiss-Prot, or PDB if you need structures.
My source is from the FANTOM consortium. They list some 2000 TFs for human. See Table S1, a list of human TFs, in Ravasi, et al. (2010 Cell 140: 744-752) that describe transcription factors. From the website: FANTOM has developed and expanded over time to encompass the fields of transcriptome analysis. The object of the project is moving steadily up the layers in the system of life, progressing thus from an understanding of the ‘elements’ - the transcripts - to an understanding of the ‘system’ - the transcriptional regulatory network.
Do you also need the possible sites they recognize? Also, do you want all available sequences or only one given species?
ProteinLounge offers a feature called the Protein Interaction Database(http://www.proteinlounge.com/Database/Databases.aspx) which lists binding sites for some proteins.
Transfac is pretty good - and not necessarily covered by UniProt:
http://www.gene-regulation.com/pub/databases.html
For yeast the yeast trac database is good, capturing Chip-Seq data:
DNABP is a database/manuscript, from late 2016, that built a machine learning method (Random Forest) to identify de-novo DNA-binding proteins using only sequence information: 1) the conservation of physiochemical protperties of the amino acids, and 2) the binding propensity of DNA-binding residues.
They divided 14,262 proteins from Uniprot for which they were confident if it was DNA-binding or non-DNA-binding and used this as their training data set; you can download this information from the supplement S1. You can also get DNA-binding and non-binding Uniprot accessions they used for their test set of their model from the supplements. Although the method achieved high accuracy (~83-90%) the web server system can only accept a single sequence at a time so it's not really suited for classifying a large number of de-novo DNA-binding proteins.
If anyone knows of a better/more-comprehensive resource available today I'd be happy if they could share it.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
In the future, it is good practice to place questions of the original poster in the comments section after the question as opposed to in an answer as you have done.