Is there any list of all transcription factors in humans and their respective ensemble gene id?
I have been trying to find one but haven't been able to do so. In this paper http://www.sciencedirect.com/science/article/pii/S0959440X04000788 they mention that there are around 2,600 proteins that function as TFs but they do not provide a table or a list as supplementary.
I am curious to know if someone has run into such table.
Perhaps a better starting point than GO accessions would be a TF database. A couple of examples (easily found via web search for "transcription factor" + database):
1,799 results for "transcription factor"AND reviewed:yes AND organism:"Homo sapiens (Human) [9606]"in UniProtKB
These are name matches so its not completely clean as you get some co-factors. You could pick out the InterPro domains and or look at the GO terms that fit and make a union
You can then use the "Customise results" to get the Ensembl ID or any other X-ref
To say the numbers we are digging out are discordant is putting it mildly
DBD - 2886 (predicted) entries for H.sapiens
Babu et al 2,600
Fantom set 1988
UniProt 1,799
AnimalTFDB - 1544 entries for H.sapiens
Vaquerizas et al 1,391
Some people are less stringent in calling a protein a TF. I currently worked on the curation of a list of human TFs (not published yet) and we are reaching around 1500 TFs when asking for experimental validation of DNA-binding AND effect on gene expression.
I am far from the leader on this project which is part of the new FANTOM5 but I do not think it is supposed to be linked to UniProt (as far as I know). But the list of TFs will be available when the paper will be published (it is close to submission now). Will let you know if you want.
Thanks, but please tell your esteemed project leader, who almost certainly knows some UniProt folk anyway, that your hard-grafted expert annotation belongs exactly in the UniProt x-ref lines (or keyword field depending on how best to slot it in). You then don't need to send me anything because I can just query the clean set out (but if your paper is not OA I would certainly appreciate a PDF).
TRANSFAC and JASPAR CORE are curated TF databases, which include human transcription factors. The second database is free and aims at non-redundancy, while the first is not free and is redundant. Both databases can include accession information for recovering protein names from external sites.
JASPAR is a database of TFs for which we have motifs. TFCat has also been developped by our lab and aimed at giving a curated catalog of mouse and human transcription factors.
http://www.tfcheckpoint.org/ manually curated list of mammalian transcription factors, including specific dbTFs