Entering edit mode
6.2 years ago
a.rex
▴
350
I recently ran interpro on predicted ORFs >100aa. I then used the PFAM_DBD and SUEPRFAMILY_DBD database IDs with the hope of collecting TFs. Of course many genes have both a Homeobox hit (PF00046) as well as another hit such as PAX (PF00292).
My question is, how do people make a prediction for the number of TFs?
I simply took all the TF hits in the list and removed duplicates. Would this be valid for identifying total number of TFs?
But how can I account for specific families?
what kind of number of TFs are you looking for: how many different types of TFs in the genome or how many genes are potentially a TF ?
How many different types of TFs. Thanks
the more difficult one thus ;)
sounds a reasonable approach. How do you deal with a case as you described (one gene, multiple hits)? And what exactly do you mean the "how can I account for specific families" ?
perhaps you might be better of in the end by first creating gene families and then annotate them family-wise based on the genes in the family.
Looking into literature might help as well. The exists quite some TF database resources and from the papers describing them you might get some ideas. Example : plantTFDB plnTFDB(check the citation section at the bottom of the page)