Dear all, I am afraid this particular question has been asked several times, but I failed to find any of the previous posts. I have 800 bacterial genomes. I know that some of these genomes may have the group of three consecutive genes with any probable insertions of some foreign genes. Sometimes one or two genes from such a group are lost – it does not matter, I need the rest left. What is the easiest way to find any orthologs of these genes in 800 bacterial genomes? I am not sure three simple alignments of a single gene sequence with all 800 genomes will help. (I read such a discussion some time ago, I have not found it.) And I am not sure I know a good soft to do it. I hope there is a better way I have forgotten about. Thank you very much! Sincerely, Natasha
This would not be a simple thing since you admit that
I would suggest that you use the three genes independently to locate their homologs in 800 genomes and then try to reconcile the results to see if they are within a certain distance and/or present in the order you expect.
Ortholog-finder may also come in handy.
before I really chip in here, can I ask for a clarification of the following
do I understand correctly that from your group of three, up to two can be lost (== so only one of three remains) ? How would you detect that one then as once being part of that group?
I'm asking because I might have an approach but that has a lower limit of three (eg. in group of 4 one can be lost) but for less it becomes less feasible or even impossible
Unfortunately it’s possible. The situation like: gene1-insertion-gene3 is common, as well as just any single gene left out of these three, like: gene1-insertion1-insertion2 or insertion1-gene2- insertion2 or insertion1-insertion2-gene 3. I will have to check these three genes separately as @genomax suggested. Oh, and measure the distance between genes in this case: gene1-insertion-gene3. But how to make it easy? Will ortholog-finder help with this task?
I understand, but what I want to say is that if only gene1 is left, you can not determine whether it was once of the group or not (and thus always has been a single gene, and the other two were never there). without additional evidence that is? are the inserted ones 'conserved' ?
Is the order important btw? is it always g1 g2 g3 or can it be g2 g1g3 , ... ?
The order is strongly conserved. It depends only upon the strand. It's either g1 g2 g3 or g3 g2 g1. Actually I was wrong - I don't have insertions, I may have some simple replacement of any of the three genes to some 'hypothetical' gene that is not orthologous to the replaced gene.
Dear all, many thanks for your answers, all of them are really helpful!