HMMER vs OrthoMCL vs Blast: When to use what for finding gene groups
1
0
Entering edit mode
5.3 years ago
Ace ▴ 90

So, I've seen a lot of people using orthoMCL to extract gene families. I am wondering why I don't see direct searches like blastp and HMMer used more. Is there logic to using OrthoMCL when you already have an idea of what kind of genes you're looking for?

gene families similarity blast orthomcl • 2.2k views
ADD COMMENT
3
Entering edit mode
5.3 years ago
Mensur Dlakic ★ 28k

There is a difference between family members and orthologs. Pseudouridine synthases in yeast and human are members of the same family (homologs), but they are not orthologs. In simplest terms, orthologs are copies of the same genes within the same species. Orthologs are sometimes redundant, but not always. BLAST and HMMer are used as a first pass in identifying orthologs, but it usually takes some post-processing to address the distinction I made earlier. Hence, OrthoMCL.

EDIT: The ortholog definition by Jean-Karim is a correct one. I leave my original message for posterity. My larger point is still the same: BLAST and HMMer scores or E-values alone are not enough to ascertain orthology without post-processing.

ADD COMMENT
3
Entering edit mode

orthologs are copies of the same genes within the same species.

This is wrong. By definition, orthologs are genes derived from a speciation event, paralogs are derived from a duplication event. Orthologs are what is usually meant when people talk about "the same gene in different species". Homologs are genes that derive from the same ancestor (= orthologs+paralogs). Formally, these relationships can only be inferred from a phylogenetic tree although short cuts and approximations are often used for computational reasons. How much effort should go into assembling gene families depends on how much one cares about the evolutionary history of the genes.

ADD REPLY
1
Entering edit mode

Right you are - I got my definitions mixed up. Thank you for catching my error.

ADD REPLY
0
Entering edit mode

So, from your response and that of Jean, it would be ok to use HMMer or blastp to identify a gene family within the same assembly or even a group of assemblies based on domain structure, but you'd want to use something like orthomcl to infer history of the genes?

In this case my ultimate goal is to create a tree showing the members of a large and diverse gene family. If I'm understanding the orthomcl > phylogenetic would be somewhat redundant and heavily focused on history, while the blastp/hmmer > phylogenetic tree method would first choose based on structural/functional similarity and then add a layer of history. Am I misunderstanding?

ADD REPLY
0
Entering edit mode

Have a look at the Treefam paper to see how we built gene families. This is the approach used by Ensembl Compara. If you don't care about evolutionary relationships, just compute pairwise sequence similarities and apply a clustering algorithm.

ADD REPLY

Login before adding your answer.

Traffic: 2161 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6