Hello, I am working on the problem of annotating the members of gene families in recently sequenced genomes, where the transcripts have been determined. Is there a tool that will let me decompose my query sequence into domains, then rank the results based off inclusion of those domains? For example, set a cutoff evalue for each domain, rank the results in descending order of domains shared with the query.
I am using profile hidden markov models now, and my problem is that if common domains are in the query sequence, biologically irrelevant sequences can dominate the rankings by only one matching domain, leading to a great deal of effort on the part of biologists to manual parse the results for matches. I really need a tool, or ranking score that will help me solve this problem, or suggestions on how I can use existing packages like hmmer, and public repositories, like PFAM, to solve this problem. Has anyone encountered a similar problem, or has any perspective on a general solution to this issue?
Thank you very much for your time, Adam