So, this is a conseptual question about the comparison of the two software results - BUSCO and Orthofinder.
I recently compared my BUSCO output of one of the plant genomes that I am working on with the Orthofinder result for classifying 22 plant proteomes, including the one that I analyzed with BUSCO. (the genome is C:96.6%[S:95.3%,D:1.3%],F:1.5%,M:1.9%,n:1375 in Busco 3 terms. It was BUSCO 3 at the time I checked the completeness of this particular genome).
Now I am trying to explain what confuses me:
First of all, Busco has 1375 gene models for arabidopsis that are supposed to be 1 copy genes present in all plant genomes. How could this be if my Orthofinder result gives me only 30 Orthogroups which are all "1111" (each specie has exactly one copy of the gene in the group) in only 20 plant proteomes?
If I take the AA sequences of the BUSCO output and compare them (Blast or Diamond) to the 20 proteomes, I have 1670 sig. similar genes from all the 20 proteomes, with 60% and more similarity. These genes are classified in 1009 Orthogroups from the output of the Orthofinder software. Still, only ~560 (of 1670) appear in groups that all the 20 species are present. And only 850 (out of 1670) are singletones in their Orthogroup. How come?
Was it incorrect to compare the output protein sequences of BUSCO to the other proteins and hypothesize that I will get most of them present in Orthogroups with all species present and that most will be singletons? Why? Or does it show that my orthofinder result is not correct. Do I interprete any of the result in the incorrect manner? What do I miss?
Thanks