Best way to identify proteins within a family?
0
1
Entering edit mode
6.5 years ago
Solowars ▴ 70

Dear community,

I'm interested in retrieving deep homologies for a number of genes that belong to a protein superfamily (let's say, GPCR). For it, one of the strategies was to perform HMMER searches, using an alignment or a HMM created from an aligment. For what I have read, many people use specific protein domains in order to determine which proteins found are true matches. In my case, my proteins don't have a specific domain characterizing them, and share a common domain with the rest of the family (e.g. the 7TM domain). Therefore, though I get a good number of good matches (proteins previously identified in the database as an homolog of my query genes) in my search, a number of other proteins from the family appear too, which somehow hampers determining if uncharacterized proteins in my search are true matches or not. I tried to improve this approach by using different domain architectures, but I'm still dealing with the problem of retrieving false matches. I tried to play around with E-values and Bit scores might help, and using a different kind of search (e.g. iterative search), but I haven't found a fully satisfactory way to tackle the issue.

Any thoughts?

Thank you!

hmmer domain homology orthologs oma • 1.7k views
ADD COMMENT
0
Entering edit mode

the sentence " In my case, my proteins don't have a specific domain characterizing them, and share a common domain with the rest of the family (e.g. the 7TM domain). " is confusing. You are going to find a "subfamily"? Btw, I think you have to draw phylogenetic trees in that case.

ADD REPLY
0
Entering edit mode

Well, let's say that we have a big receptor family, like GPCRs, which contains receptors for a broad array of neurotransmitters. However, I'm interested in a specific subgroup (receptors of a specific neurotransmitter), and these receptors don't have a specific domain characterizing them, other than the 7TM (7-transmembrane) domain, which is common to all GPCRs. I thought about building phylogenetic trees, but the amount of GPCRs and species matching a given HMM query is way too big to build a tree, so I'm trying to improve my filtering (either by playing around with search thresholds or improving my query) in order to reduce the number of putative proteins to a more bearable number. I know that there are several strategies that I could use in order to identify proteins, below the domain level, such as fingerprints or Interpro protein family predictions, but I think those can still introduce errors (e.g. misassigned proteins), so that's why I ask whether there's a different strategy that I'm not aware of yet that works better.

ADD REPLY
0
Entering edit mode

I think checking boot strap values of phylogenetic trees and checking conservation of important residues in alignments are essential for such a detail analysis. (There are some automatic approach such as orthomcl (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC403725/) but as I haven't tried this software, I may not get expected result. just FYI.)

ADD REPLY

Login before adding your answer.

Traffic: 1294 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6