According an the article (DOI:10/.1126/science.1066355) there are 2000 to 3000 transcription factors. How many of them have a known TFBS (transcription factor binding sites) in humans. This answer could be derived bioinformatically or by a citation.
According an the article (DOI:10/.1126/science.1066355) there are 2000 to 3000 transcription factors. How many of them have a known TFBS (transcription factor binding sites) in humans. This answer could be derived bioinformatically or by a citation.
I would check out http://jaspar.genereg.net/. It has collections of known transcription factor binding sites. I didn't actually count them but it looks like around 150-300.
If I look at the TRANSFAC release dated Mar 2010, there are 1300 TFBS matrices given. This includes minor variations like V$P53_03 V$P53_04 V$P53_05 and so on. If I restrict my search to vertebrates the number is 908. If I ignore the part after the '_', the number works out to 598. While this is vertebrates and not just humans you could get some idea of the numbers involved.
I have 245 human transcription factors with motifs described in TRANSFAC and/or JASPAR.
I also use data published by Xie, et al (Xie, Z., Hu, S.H., Blackshaw, S., Zhu, H. and Qian, J. (2009) hPDI: a database of experimental human protein-DNA interactions, Bioinformatics. (In press); http://bioinfo.wilmer.jhu.edu/PDI/) which gives sequence motifs for known and non-traditional DNA-protein interactions. From those data, I extracted 1015 genes.
Thanks! I heard one of the authors speak at RECOMB satellite conf in 2009. For me this is an interesting resource that has value for a given gene on a case by case basis. In other words, I am looking for other data on the gene before I put a lot of value on the PDI data. Nonetheless, their data make one think about the fine tuning of transcription based on a multitude of inputs from proteins with various functions.
According to this site the answer is around 130: http://oreganno.org/tfview/cgi-bin/specieslist.pl
Does that seem reasonable? Are there different answers?
The data is spread over at least three databases: Transfac, Jaspar and UniProbe. You could drop Transfac, it's a rather strange datasource. You would need to parse the (human) gene identifiers from Jaspar and Uniprobe and then get the union of them.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Just to clarify, "derived bioinformatically" should mean "using a database of known TFBS". Predicted binding sites are not necessarily real binding sites.
UHU! Tough question!!! No bounty?
I have no ability to provide bounties sorry.
Agreed Neil, I think only experimentally verified sites can count as 'real' TFBS.
So the value of 150-300 are values of predicted binding sites but the experimentally verified sites are lower? I guess the next question is how many of the 150-300 are experimentally verified?
I think this question needs to be posed more tightly to be answered properly. Which binding sites count: only those in the the genome or also those identified in vitro selection? Also, do data from footprinting, EMSA, mutational analysis, and/or Chip all count as valid TFBSs? Finally, are you asking how many sequence specific TFs have TFBSs or how many of all know TFs have a known TFBS (including TFs that do not have sequence specific action, e.g. histone modifying factors). Finally, and most difficult to answer, what is your definition of a TF?
The context of this question is what would you say if you were asked this in a thesis defense? I was not, but told to be prepared for questions like that. All of the follow up questions you asked @Case Bergman are useful.