Question

Why less Orthologs - can give enough information ?

0

Entering edit mode

2.4 years ago

sunnykevin97 ▴ 990

Hi,

I ran Orthofinder on 15 closely related species, as a result I end up in 1080 single copy orthologs

If I reduce the no. of species to ~10, it resulted in 1850 single copy orthologs.

All the genomes were non-model organisms, sequenced at different coverages, some of them are 30X and others were around 70X coverage.

What I'd be the reason, why less orthologs ?

Is 1080 orthologs a good number, to proceed further to perform selection analysis ?

Suggestions please.

gene genome protein • 861 views

ADD COMMENT • link 2.4 years ago by sunnykevin97 ▴ 990

score 2 · Answer 1 · 2022-06-20

Not sure you will be able to do what you want, but to even stand a chance you would have to be absolutely sure that all the genomes are complete. Let me give you a simple example. If one of the genomes in either group, by randomness of shotgun sequencing and subsequent assembly, lacks a group of 50 genes that all others have, your ortholog count will be artificially lowered by 50. Whatever conclusions you drew from this finding, they wouldn't be real, and you have this times 15 because presumably all the genomes are somewhat incomplete.

Maybe you can get to some conclusion by separately finding orthologs in 70-80% of each of the two categories. This somewhat removes the randomness of sequencing and assembly, because the assumption is that if a gene is present in 70-80% of the genomes, it is present in all of them. If you compare the list of orthologs obtained this way from the two categories, there may be a difference pattern that could be informative regarding your area of interest.

score 1 · Answer 2 · 2022-06-20

There are two ways of looking at this. The first is a matter of probabilistic reasoning. Let's say that each of 15 genomes is 5% incomplete, and we randomly take out 5% of sequence from it. That should simulate the randomness of the assembly process. What are the chances that all 15 of them are missing the same 5% of the genome? Because if all of them were missing the same part, you would get the same number of orthologs whether comparing 10 or 15 genomes. Since they are not all missing the same part, increasing the number of genomes used for comparisons means that the subset of genes at least one of them is missing will also increase with each added genome.

The second reason is that species are normally different when it comes to number of genes they have, either because they live in different environments (fewer genes are needed in environments where only one or two energy sources are available) or because of their lifestyle (many microbes depend on others around them, and because of that can afford to have reduced genomes). Again, adding any new species to the analysis increases the likelihood that it will lack some genes all others have, which will reduce the number of orthologs.

Is 1080 orthologs a good number, to proceed further to perform selection analysis ?

Don't know what you are trying to do, so it is difficult to give good advice. To me that sounds like an awfully high number, but I have no idea what you are trying to do. The number of single-copy genes most bacteria share is in low hundreds (100-200), and similar is true for archaea. Naturally, if one picks more related microbes those numbers will be higher, as you figured out on your own.