I have a data set of 700 coding sequences (which I saved in a reference.fa file) and I need to obtain the orthology relationships (orthologous and paralogous) of these genes against 70 species. I have downloaded the genome cds from NCBI for each species. Now, is it possible to get only these comparisons without going through the all vs all phase? Could I start OMA without comparing all the genomes (cds) with each other? How do I launch these analyses ?
At the moment, it’s necessary to compute all-all among your 70 genomes as well to elucidate orthology relationships.
However, I see two fast alternatives:
1) If these 70 reference genomes are available in OMA, you could export them including all-against-all. Then, OMA would “just” need to compute the 700 CDS against the 70 genomes (and the CDS against themselves but that would be very fast.)
I am trying to run OMA (version OMA.2.4.2) on cluster using SLURM. I basically followed the procedure described in Oma Cheat Sheet, and in my DB directory I collected genomes for 20 species (coming from OMA using the export option) plus the genome of my species of interest. However, the array-job (splited into 1000 parallelized jobs) seems to take a lot of time (each job runs for over 8 hours!) Is there any way to speed up this step?
Thanks for your help,
I think I would compute all vs all then.