As I understand, pfam contains data on protein families and hmm profiles that can be used to annotate domains onto novel proteins in newly sequenced genomes.
If you had a set of pfam protein domains, would you expect there to be much functional redundancy at the domain level? E.g. two domains annotated as different by pfam, but which perform similar functions (with differences perhaps driven by adaption to different ecological niches)?
There will be a lot of redundancy, which is why families are grouped into clans. Yet the main cause redundancy may not be what you assumed above. There are numerous protein superfamilies (e.g., alpha/beta hydrolases, nucleases, methyltransferases, dehydrogenases) that are represented by many protein families. Within the same superfamily (or clan, as Pfam calls it) these domains will perform the same chemical reaction, but often on a different substrate. Since hidden Markov models are very sensitive, in many cases multiple Pfam domains will be matched with statistical significance to the same region.
Ah very interesting, thanks for your reply and for pointing me towards clans, which gave some very useful reading!