Hi, imagine we have two sequences from two different lineages of E. Coli, which are more than 80% similar based on blast results. How can we know whether these genes are ortholog or paralog?
Hi, imagine we have two sequences from two different lineages of E. Coli, which are more than 80% similar based on blast results. How can we know whether these genes are ortholog or paralog?
A common strategy to identify orthologs is called the reciprocal best hit (RCBH):
If the two sequences are their best mutual match across the whole genome of the two lineage, then they are probably direct orthologs (XA' and YA' on the image below for instance).
If other sequences in the two lineages are better matches, then the two sequences could be actually the orthologs of their respective paralog instead (XA' with YA'', what is called below an out-paralog).
Thank you, I was wondering if two genes belong to two different isolates of E. coli (i.e., they are still the same species but different isolates), should we call them ortholog or paralog? Also, regarding reciprocal best hit (RCBH), I was wondering which tool do you recommend?
Actually, if it is the same species, they can , by definition, not be orthologs: the two genes are either the same gene (with possible variations, SNP, etc...) or not. If they are not the same gene, homology could be explained by gene duplication in their ancestor, so they could be paralogs.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Orthologs are from different organisms. Paralogs are from the same organims. You will have to run phylogenetics analysis and also check the sequence identity [>= 70%].
I would also suggest you read this paper by Koonin: https://doi.org/10.1146/annurev.genet.39.073003.114725