Question

How to convert Transcripts ID of rhesus monkey to Transcripts ID of homo sapiens

0

Entering edit mode

4.3 years ago

Quanyou • 0

I know how to perform homologous Gene ID conversion between different species. But now I want to convert Transcripts ID of rhesus monkey to Transcripts ID of homo sapiens because I'm interested in some transcripts. Could anyone tell me how to do it?

Thanks in advances

transcript rna-seq • 4.2k views

ADD COMMENT • link updated 4.3 years ago by tamerg ▴ 100 • written 4.3 years ago by Quanyou • 0

0

Entering edit mode

Rather than doing ID conversions why not do a sequence search? You can download transcripts for Macaca mulatta from NCBI.

ADD REPLY • link 4.3 years ago by GenoMax 153k

0

Entering edit mode

This seems feasible. But I want to convert tens of thousands of transcript IDs. Could you tell me how can I make it through sequene search?

ADD REPLY • link 4.3 years ago by Quanyou • 0

1

Entering edit mode

4.3 years ago

tamerg ▴ 100

An alternative way is using biobtreeR for this mapping

# 1) set a directory
bbUseOutDir("path of a directory")

# 2) build data for interesed genomes with orthologs  
# fetches latest related raw data directly from ensembl and indexed for mapping
bbBuildCustomDB(rawArgs="--tax 9606,9544 --ensembl-orthologs build")

# 3) start biobtreeR
bbStart()

# 4)  perform mappings
bbMapping("ENSMMUT00000110390","map(ensembl).map(ortholog).map(transcript)")

Note that mapping function can get multiple identifiers and once step 2 is performed it can be skipped for later use. This also allows reproducing the same results.

ADD COMMENT • link 4.3 years ago by tamerg ▴ 100

score 5 · Accepted Answer · 2021-04-10

I mean, you are welcome to try this, via biomaRt. I admit to not fully understanding how Ensembl have determined these mappings 'behind the scenes', but I imagine and hope that they are based on genome alignments and / or general sequence homology, following from GenoMax's point.

This is an elaboration of an answer that I gave on Bioconductor: https://support.bioconductor.org/p/132551/#132568

Setup

require(biomaRt)

listDatasets(useMart('ensembl'))

datasets <- listDatasets(useMart('ensembl'))
datasets[grep('mmulatta', datasets[,1]),]
                  dataset             description version
104 mmulatta_gene_ensembl Macaque genes (Mmul_10) Mmul_10

rhesus <- useMart('ensembl', dataset = 'mmulatta_gene_ensembl')
human <- useMart('ensembl', dataset = 'hsapiens_gene_ensembl')

Create a Rhesus macaque (M. mulatta) annotation lookup `table` (not necessary)

table <- getBM(
  attributes = c('ensembl_gene_id', 'vgnc', 'vgnc_trans_name', 'external_gene_name'),
  mart = rhesus)
head(table[table$external_gene_name != '',], 30)

      ensembl_gene_id        vgnc vgnc_trans_name external_gene_name
2  ENSMMUG00000036181                                             U6
3  ENSMMUG00000000634 VGNC:100195      ZNF692-201             ZNF692
4  ENSMMUG00000000634 VGNC:100195      ZNF692-202             ZNF692
5  ENSMMUG00000000634 VGNC:100195      ZNF692-203             ZNF692
6  ENSMMUG00000037875                                         ZNF672
7  ENSMMUG00000000632  VGNC:77334     SH3BP5L-201            SH3BP5L
8  ENSMMUG00000000632  VGNC:77334     SH3BP5L-202            SH3BP5L
9  ENSMMUG00000000632  VGNC:77334     SH3BP5L-203            SH3BP5L
14 ENSMMUG00000031199                                         OR2T27
17 ENSMMUG00000025700                                          Y_RNA
18 ENSMMUG00000049101                                         OR2T10
21 ENSMMUG00000055842                                         OR14I1
23 ENSMMUG00000062331                                          OR9G1
26 ENSMMUG00000005566                                          OR2T1
27 ENSMMUG00000064874                                          OR2T6
29 ENSMMUG00000038524                                          OR2M4
35 ENSMMUG00000062908                                          Y_RNA
36 ENSMMUG00000057748                                          OR2L2
37 ENSMMUG00000051218                                          Y_RNA
38 ENSMMUG00000050041                                          OR2L5
40 ENSMMUG00000054556                                          Y_RNA
43 ENSMMUG00000059751                                          Y_RNA
46 ENSMMUG00000038159                                          OR2W3
47 ENSMMUG00000005570 VGNC:100177      TRIM58-201             TRIM58
48 ENSMMUG00000006214                                         OR11L1
50 ENSMMUG00000049478                                         OR9H1P
51 ENSMMUG00000062849                                          OR1C1
52 ENSMMUG00000064145                                         OR14K1
53 ENSMMUG00000059455                                         OR14A2
54 ENSMMUG00000016175                                          OR6F1

Create a merged annotation `table` between Rhesus macaque (M. mulatta) and Human (H. sapiens)

table <- getLDS(
  mart = rhesus,
  attributes = c('ensembl_gene_id', 'vgnc', 'external_gene_name', 'chromosome_name'),
  martL = human,
  attributesL = c('ensembl_gene_id','hgnc_symbol','gene_biotype', 'chromosome_name'))

head(table)
      Gene.stable.ID VGNC.ID Gene.name Chromosome.scaffold.name
1 ENSMMUG00000052610                                          1
2 ENSMMUG00000065379              ND4L                       MT
3 ENSMMUG00000060150           5S_rRNA           QNVO02001753.1
4 ENSMMUG00000065383              ATP8                       MT
5 ENSMMUG00000063555                                          8
6 ENSMMUG00000065366              COX3                       MT
  Gene.stable.ID.1 HGNC.symbol      Gene.type Chromosome.scaffold.name.1
1  ENSG00000271254             protein_coding                 KI270711.1
2  ENSG00000212907     MT-ND4L protein_coding                         MT
3  ENSG00000278457                       rRNA                 KI270442.1
4  ENSG00000228253     MT-ATP8 protein_coding                         MT
5  ENSG00000198695      MT-ND6 protein_coding                         MT
6  ENSG00000198938      MT-CO3 protein_coding                         MT

dim(table)
[1] 23385     8

table[6000:6010,]
         Gene.stable.ID    VGNC.ID    Gene.name Chromosome.scaffold.name
6000 ENSMMUG00000007970                                                3
6001 ENSMMUG00000005091 VGNC:75524        NRBP1                       13
6002 ENSMMUG00000010505 VGNC:82153       GTF3C2                       13
6003 ENSMMUG00000009610 VGNC:81776      TMEM196                        3
6004 ENSMMUG00000061977                  SPIRE1                       18
6005 ENSMMUG00000000452                    LSM7                       19
6006 ENSMMUG00000003962 VGNC:76941         RRM1                       14
6007 ENSMMUG00000014165                   CHADL                       10
6008 ENSMMUG00000009818 VGNC:75393          NPY                        3
6009 ENSMMUG00000026995            mml-mir-320a                        8
6010 ENSMMUG00000003815 VGNC:69540     ADAMTSL5                       19
     Gene.stable.ID.1 HGNC.symbol      Gene.type Chromosome.scaffold.name.1
6000  ENSG00000146574       CCZ1B protein_coding                          7
6001  ENSG00000115216       NRBP1 protein_coding                          2
6002  ENSG00000115207      GTF3C2 protein_coding                          2
6003  ENSG00000173452     TMEM196 protein_coding                          7
6004  ENSG00000134278      SPIRE1 protein_coding                         18
6005  ENSG00000130332        LSM7 protein_coding                         19
6006  ENSG00000167325        RRM1 protein_coding                         11
6007  ENSG00000100399       CHADL protein_coding                         22
6008  ENSG00000122585         NPY protein_coding                          7
6009  ENSG00000208037     MIR320A          miRNA                          8
6010  ENSG00000185761    ADAMTSL5 protein_coding                         19

Setup

Create a Rhesus macaque (M. mulatta) annotation lookup table (not necessary)

Create a merged annotation table between Rhesus macaque (M. mulatta) and Human (H. sapiens)

Create a Rhesus macaque (M. mulatta) annotation lookup `table` (not necessary)

Create a merged annotation `table` between Rhesus macaque (M. mulatta) and Human (H. sapiens)