I would like to find all proteins of a particular protein family characterized from land plants.
UniProt/SwissProt doesn't contain up to date protein sequence information. TrEMBL contains some characterized proteins, and many computationally predicted and annotated protein sequences. For those sequences that are characterized and listed in TrEMBL, they may still be annotated as "predicted" or "putative"
For example, a heterologously-expressed plant protein characterized in 2011 (the expressed sequence was originally cloned from the cDNA of a the plant's mRNA library) is nowhere to be found in the SwissProt database, but it is listed as a "putative" protein in TrEMBL.
What I would like to do is annotate a phylogenetic tree based on the functions of characterized proteins that fall within each clade. I end up with inadequate annotation because of this.
Does anybody know of a better database or another way to do this? I can do it manually, but I would obviously like to avoid that if there's a better way. I have tried CharProtDB with no luck.
Thanks!