You want to do orthologue identification with OMA and therefore the first task you describe above needs some correction to be successful:
Given a list of species, according to documentation you have to download a proteome annotation file in FASTA format for each. In particular, you do not need all or any assemblies per species, but the single representative proteome file. The filename should be the name of the genome.
Run OMA or another software for orthologue identification on these files as described in the software's documentation
I have a simple shell script that can download the proteome of the representative genome automatically if it exists. For genomes where the gene predictions pipeline has not been run, it cannot give you anything, however.
The following shell script can download the genome assembly and proteome file from NCBI for a list of species names. It does a little
bit more than is needed for step 1. but you will figure that out. Be careful, there is not much error checking, so if you have typos in the species list or a species doesn't have an annotated proteome, this may fail miserably. It also leaves the results from the Entrez queries around for your records.
You need to have Entrez e-utils installed in your path.
I am ignoring the python tag here because it is not important to make things work.
Why, if it works? After you download everything, the next step is to invoke OMA via the command line. Whether you wrap this process in python or R makes no difference. Of course, you can write similar code like the above in python or R. For R, there is the package biomartr which can download genomic data from different sources. For python, there should be a solution in biopython, and a related question on Biostars here: Download NCBI genome sequences from Python
Possibly, someone else can help you with such an implementation, but it won't be substantially easier or less error-prone than using my script.
It is definitely possible but futile for OMA analysis. There is no genome annotation for Abrostola tripartite hence no proteome, and the other links point to multiple taxa. I am not sure why you insist on Python (guessing 'assignment' or 'order from your boss'), but if you need such a python solution, I cannot help you. I am bumping this post to allow others to see it and possibly help out, but I personally think that it is best to approach the problem in a solution-oriented, not in a tool-centric way.
If you want to infer OMA HOGs you will need to have the protein sequences for all your genomes. Either you restrict yourself to only genomes that have already annotated protein sequences available, or you first need to infer them yourself. There are tons of tools and pipelines for that, but it won't be easy very easy to do.
Michael's script is very helpful to download the genomes and also the protein sequences if available. You shouldn't insist on it being a python script in my view. His code makes use of the EntrezTool from NCBI, which is perfect. Biopython has also a wrapper to it, so you could rewrite Michael's script in python if you (or your boss) insists.
To download the genomes from OMA, you also have an export function ( https://omabrowser.org/export ) where you can select your genomes of interest and export a tarball including oma standalone and the precomputed All-vs-All homology search files.
[..] you also have an export function ( https://omabrowser.org/export ) where you can select your genomes of interest and export a tarball including oma standalone and the precomputed All-vs-All homology search files
You want to do orthologue identification with OMA and therefore the first task you describe above needs some correction to be successful:
I have a simple shell script that can download the proteome of the representative genome automatically if it exists. For genomes where the gene predictions pipeline has not been run, it cannot give you anything, however.
This might be helpful if you haven't come across it yet:
Expanding the Orthologous Matrix (OMA) programmatic interfaces: REST API and the OmaDB packages for R and Python
I want to fetch genomes from NCBI nor from oma
First i want to put ncbi species id to download genomes