I mean, you are welcome to try this, via biomaRt. I admit to not fully understanding how Ensembl have determined these mappings 'behind the scenes', but I imagine and hope that they are based on genome alignments and / or general sequence homology, following from GenoMax's point.
This is an elaboration of an answer that I gave on Bioconductor: https://support.bioconductor.org/p/132551/#132568
Setup
require(biomaRt)
listDatasets(useMart('ensembl'))
datasets <- listDatasets(useMart('ensembl'))
datasets[grep('mmulatta', datasets[,1]),]
dataset description version
104 mmulatta_gene_ensembl Macaque genes (Mmul_10) Mmul_10
rhesus <- useMart('ensembl', dataset = 'mmulatta_gene_ensembl')
human <- useMart('ensembl', dataset = 'hsapiens_gene_ensembl')
Create a Rhesus macaque (M. mulatta) annotation lookup table
(not necessary)
table <- getBM(
attributes = c('ensembl_gene_id', 'vgnc', 'vgnc_trans_name', 'external_gene_name'),
mart = rhesus)
head(table[table$external_gene_name != '',], 30)
ensembl_gene_id vgnc vgnc_trans_name external_gene_name
2 ENSMMUG00000036181 U6
3 ENSMMUG00000000634 VGNC:100195 ZNF692-201 ZNF692
4 ENSMMUG00000000634 VGNC:100195 ZNF692-202 ZNF692
5 ENSMMUG00000000634 VGNC:100195 ZNF692-203 ZNF692
6 ENSMMUG00000037875 ZNF672
7 ENSMMUG00000000632 VGNC:77334 SH3BP5L-201 SH3BP5L
8 ENSMMUG00000000632 VGNC:77334 SH3BP5L-202 SH3BP5L
9 ENSMMUG00000000632 VGNC:77334 SH3BP5L-203 SH3BP5L
14 ENSMMUG00000031199 OR2T27
17 ENSMMUG00000025700 Y_RNA
18 ENSMMUG00000049101 OR2T10
21 ENSMMUG00000055842 OR14I1
23 ENSMMUG00000062331 OR9G1
26 ENSMMUG00000005566 OR2T1
27 ENSMMUG00000064874 OR2T6
29 ENSMMUG00000038524 OR2M4
35 ENSMMUG00000062908 Y_RNA
36 ENSMMUG00000057748 OR2L2
37 ENSMMUG00000051218 Y_RNA
38 ENSMMUG00000050041 OR2L5
40 ENSMMUG00000054556 Y_RNA
43 ENSMMUG00000059751 Y_RNA
46 ENSMMUG00000038159 OR2W3
47 ENSMMUG00000005570 VGNC:100177 TRIM58-201 TRIM58
48 ENSMMUG00000006214 OR11L1
50 ENSMMUG00000049478 OR9H1P
51 ENSMMUG00000062849 OR1C1
52 ENSMMUG00000064145 OR14K1
53 ENSMMUG00000059455 OR14A2
54 ENSMMUG00000016175 OR6F1
Create a merged annotation table
between Rhesus macaque (M. mulatta) and Human (H. sapiens)
table <- getLDS(
mart = rhesus,
attributes = c('ensembl_gene_id', 'vgnc', 'external_gene_name', 'chromosome_name'),
martL = human,
attributesL = c('ensembl_gene_id','hgnc_symbol','gene_biotype', 'chromosome_name'))
head(table)
Gene.stable.ID VGNC.ID Gene.name Chromosome.scaffold.name
1 ENSMMUG00000052610 1
2 ENSMMUG00000065379 ND4L MT
3 ENSMMUG00000060150 5S_rRNA QNVO02001753.1
4 ENSMMUG00000065383 ATP8 MT
5 ENSMMUG00000063555 8
6 ENSMMUG00000065366 COX3 MT
Gene.stable.ID.1 HGNC.symbol Gene.type Chromosome.scaffold.name.1
1 ENSG00000271254 protein_coding KI270711.1
2 ENSG00000212907 MT-ND4L protein_coding MT
3 ENSG00000278457 rRNA KI270442.1
4 ENSG00000228253 MT-ATP8 protein_coding MT
5 ENSG00000198695 MT-ND6 protein_coding MT
6 ENSG00000198938 MT-CO3 protein_coding MT
dim(table)
[1] 23385 8
table[6000:6010,]
Gene.stable.ID VGNC.ID Gene.name Chromosome.scaffold.name
6000 ENSMMUG00000007970 3
6001 ENSMMUG00000005091 VGNC:75524 NRBP1 13
6002 ENSMMUG00000010505 VGNC:82153 GTF3C2 13
6003 ENSMMUG00000009610 VGNC:81776 TMEM196 3
6004 ENSMMUG00000061977 SPIRE1 18
6005 ENSMMUG00000000452 LSM7 19
6006 ENSMMUG00000003962 VGNC:76941 RRM1 14
6007 ENSMMUG00000014165 CHADL 10
6008 ENSMMUG00000009818 VGNC:75393 NPY 3
6009 ENSMMUG00000026995 mml-mir-320a 8
6010 ENSMMUG00000003815 VGNC:69540 ADAMTSL5 19
Gene.stable.ID.1 HGNC.symbol Gene.type Chromosome.scaffold.name.1
6000 ENSG00000146574 CCZ1B protein_coding 7
6001 ENSG00000115216 NRBP1 protein_coding 2
6002 ENSG00000115207 GTF3C2 protein_coding 2
6003 ENSG00000173452 TMEM196 protein_coding 7
6004 ENSG00000134278 SPIRE1 protein_coding 18
6005 ENSG00000130332 LSM7 protein_coding 19
6006 ENSG00000167325 RRM1 protein_coding 11
6007 ENSG00000100399 CHADL protein_coding 22
6008 ENSG00000122585 NPY protein_coding 7
6009 ENSG00000208037 MIR320A miRNA 8
6010 ENSG00000185761 ADAMTSL5 protein_coding 19
Rather than doing ID conversions why not do a sequence search? You can download transcripts for Macaca mulatta from NCBI.
This seems feasible. But I want to convert tens of thousands of transcript IDs. Could you tell me how can I make it through sequene search?