Dear colleagues,
Recently I successfully complete OMA standalone run and now I am trying to analyze results using the pyHam python package. The main task for me is to reconstruct the genome of the ancestor of the analyzed species. After executing the following commands, I get information that in the model of the ancestor genome are 10372 genes:
ham_analysis = pyham.Ham(nwk_file, orthoxml_file, use_internal_name=True)
ancestral_genome = ham_analysis.get_ancestral_genome_by_name(ancestral_genome_name)
ancestral_genes = ancestral_genome.genes
print(len(ancestral_genes))
I am a little alarmed by two facts: 1) In ancestral_genes, many elements (HOG) have this designation <HOG ()>, and do not have a number, in contrast to, for example, <HOG (17899)> or <HOG (17900)>. Is it mistake if previously the sequences assigned to <HOG ()> were included in numbered orthogroups? Or am I missing something? The same designations are found even in the tutorial (https://zoo.cs.ucl.ac.uk/tutorials/tutorial_pyHam_get_started.html) [In [37]].
2) However, if we extract information about the descendant genes in these <HOG ()>, we will see that some of these groups of orthologs contain only 1 or 2 sequences of modern species. Correct me, please, if I am mistaken, but how can it be considered that the ancestor had a gene if it is present only in 1 or 2 species out of, for example, 11 analyzed? Is it worth specifying a certain threshold for the number of modern species in which a gene must be present to reconstruct the ancestor's genome? And how to do it most efficiently and correctly?
I would be grateful for any help!
How did you extract information about the descendent genes in HOGs without numbers <HOG()> ? I have a ton of these but I can't find a way to do it.
you can use the method
get_all_descendant_genes()
of the Hog objects to retrieve the extant genes.Thank you, I think I might be having a syntax issue with
get_all_descendant_genes()
you need to iterate over all hogs in the hog1 set (hogs1 is a set of lost hogs).
ahh that makes sense. Thank you!