Question

Gene Annotation By Protein Domain Match Criteria

1

Entering edit mode

13.3 years ago

Burlappsack ▴ 690

Hello, I am working on the problem of annotating the members of gene families in recently sequenced genomes, where the transcripts have been determined. Is there a tool that will let me decompose my query sequence into domains, then rank the results based off inclusion of those domains? For example, set a cutoff evalue for each domain, rank the results in descending order of domains shared with the query.

I am using profile hidden markov models now, and my problem is that if common domains are in the query sequence, biologically irrelevant sequences can dominate the rankings by only one matching domain, leading to a great deal of effort on the part of biologists to manual parse the results for matches. I really need a tool, or ranking score that will help me solve this problem, or suggestions on how I can use existing packages like hmmer, and public repositories, like PFAM, to solve this problem. Has anyone encountered a similar problem, or has any perspective on a general solution to this issue?

Thank you very much for your time, Adam

gene prediction next-gen • 2.0k views

ADD COMMENT • link updated 13.3 years ago by Lee Katz ★ 3.2k • written 13.3 years ago by Burlappsack ▴ 690

score 1 · Answer 1 · 2012-05-05

I don't work on this specific issue but I'd imagine that you can annotate a gene with domains in a particular order, like PFAM1-PFAM5-PFAM3. Then, any other gene with those specific domains in that order would be considered as the same gene. Therefore you could generate some kind of tab-delimited file that has a gene ID followed by its domains and the file would be easy to parse for other uses.

To use PFAM, it looks like they have their own in-house tool. ftp://ftp.sanger.ac.uk/pub/databases/Pfam/Tools/