Gene Annotation By Protein Domain Match Criteria
1
1
Entering edit mode
12.6 years ago
Burlappsack ▴ 690

Hello, I am working on the problem of annotating the members of gene families in recently sequenced genomes, where the transcripts have been determined. Is there a tool that will let me decompose my query sequence into domains, then rank the results based off inclusion of those domains? For example, set a cutoff evalue for each domain, rank the results in descending order of domains shared with the query.

I am using profile hidden markov models now, and my problem is that if common domains are in the query sequence, biologically irrelevant sequences can dominate the rankings by only one matching domain, leading to a great deal of effort on the part of biologists to manual parse the results for matches. I really need a tool, or ranking score that will help me solve this problem, or suggestions on how I can use existing packages like hmmer, and public repositories, like PFAM, to solve this problem. Has anyone encountered a similar problem, or has any perspective on a general solution to this issue?

Thank you very much for your time, Adam

gene prediction next-gen • 1.8k views
ADD COMMENT
1
Entering edit mode
12.6 years ago
Lee Katz ★ 3.2k

I don't work on this specific issue but I'd imagine that you can annotate a gene with domains in a particular order, like PFAM1-PFAM5-PFAM3. Then, any other gene with those specific domains in that order would be considered as the same gene. Therefore you could generate some kind of tab-delimited file that has a gene ID followed by its domains and the file would be easy to parse for other uses.

To use PFAM, it looks like they have their own in-house tool. ftp://ftp.sanger.ac.uk/pub/databases/Pfam/Tools/

ADD COMMENT

Login before adding your answer.

Traffic: 2487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6