I'll describe my situation, possible approach, and then what would be really nice to have:
The needs
Comparing expression between two species, most algorithms start with one to one ortholog mappings. More elaborate methods will start from a two sided weighted graph (AKA bi-partite graph, a part for each transcritome), which is equivalent to a similarity matrix between the genes.
In my case i might use standard transcriptomes for model organisms, or more personally curated transcriptoms from less-studied organisms
BTW an interesting result from the SAMap article (Tarashansky et.al 2021) is that for large evolutionary distances, paralogs might be more interesting then orthologs!
Possible approaches
- For human-mouse comparisons, some people just change the case from lower to upper, and use only the matched genes
- Reciprocal blast: blasting one transcriptome vs the other and back, and taking pairs that were the best hit for one another. This can be used for for both 1-1 mapping, and as a basis for similarity matrix
- Using orthology trees information (I think OMA browser uses that)
I'm new to this, and would be happy to learn of new approaches
Nice to have By rank from better to worse:
- A website that for model organisms has pre-compiled the 1-1 ortholog mapping for genes (say mouse Mus_musculus.GRCm39.105.gtf vs human genecode38 )
- A webserver to which I can download FASTA files of the genomes, and creates the mapping
- An installable app, that given two FASTA files outputs the mapping
- A script that does the task
Thanks !