How to convert Transcripts ID of rhesus monkey to Transcripts ID of homo sapiens
2
0
Entering edit mode
3.8 years ago
Quanyou • 0

I know how to perform homologous Gene ID conversion between different species. But now I want to convert Transcripts ID of rhesus monkey to Transcripts ID of homo sapiens because I'm interested in some transcripts. Could anyone tell me how to do it?

Thanks in advances

transcript rna-seq • 3.6k views
ADD COMMENT
0
Entering edit mode

Rather than doing ID conversions why not do a sequence search? You can download transcripts for Macaca mulatta from NCBI.

ADD REPLY
0
Entering edit mode

This seems feasible. But I want to convert tens of thousands of transcript IDs. Could you tell me how can I make it through sequene search?

ADD REPLY
5
Entering edit mode
3.8 years ago

I mean, you are welcome to try this, via biomaRt. I admit to not fully understanding how Ensembl have determined these mappings 'behind the scenes', but I imagine and hope that they are based on genome alignments and / or general sequence homology, following from GenoMax's point.

This is an elaboration of an answer that I gave on Bioconductor: https://support.bioconductor.org/p/132551/#132568

Setup

require(biomaRt)

listDatasets(useMart('ensembl'))

datasets <- listDatasets(useMart('ensembl'))
datasets[grep('mmulatta', datasets[,1]),]
                  dataset             description version
104 mmulatta_gene_ensembl Macaque genes (Mmul_10) Mmul_10

rhesus <- useMart('ensembl', dataset = 'mmulatta_gene_ensembl')
human <- useMart('ensembl', dataset = 'hsapiens_gene_ensembl')

Create a Rhesus macaque (M. mulatta) annotation lookup table (not necessary)

table <- getBM(
  attributes = c('ensembl_gene_id', 'vgnc', 'vgnc_trans_name', 'external_gene_name'),
  mart = rhesus)
head(table[table$external_gene_name != '',], 30)

      ensembl_gene_id        vgnc vgnc_trans_name external_gene_name
2  ENSMMUG00000036181                                             U6
3  ENSMMUG00000000634 VGNC:100195      ZNF692-201             ZNF692
4  ENSMMUG00000000634 VGNC:100195      ZNF692-202             ZNF692
5  ENSMMUG00000000634 VGNC:100195      ZNF692-203             ZNF692
6  ENSMMUG00000037875                                         ZNF672
7  ENSMMUG00000000632  VGNC:77334     SH3BP5L-201            SH3BP5L
8  ENSMMUG00000000632  VGNC:77334     SH3BP5L-202            SH3BP5L
9  ENSMMUG00000000632  VGNC:77334     SH3BP5L-203            SH3BP5L
14 ENSMMUG00000031199                                         OR2T27
17 ENSMMUG00000025700                                          Y_RNA
18 ENSMMUG00000049101                                         OR2T10
21 ENSMMUG00000055842                                         OR14I1
23 ENSMMUG00000062331                                          OR9G1
26 ENSMMUG00000005566                                          OR2T1
27 ENSMMUG00000064874                                          OR2T6
29 ENSMMUG00000038524                                          OR2M4
35 ENSMMUG00000062908                                          Y_RNA
36 ENSMMUG00000057748                                          OR2L2
37 ENSMMUG00000051218                                          Y_RNA
38 ENSMMUG00000050041                                          OR2L5
40 ENSMMUG00000054556                                          Y_RNA
43 ENSMMUG00000059751                                          Y_RNA
46 ENSMMUG00000038159                                          OR2W3
47 ENSMMUG00000005570 VGNC:100177      TRIM58-201             TRIM58
48 ENSMMUG00000006214                                         OR11L1
50 ENSMMUG00000049478                                         OR9H1P
51 ENSMMUG00000062849                                          OR1C1
52 ENSMMUG00000064145                                         OR14K1
53 ENSMMUG00000059455                                         OR14A2
54 ENSMMUG00000016175                                          OR6F1

Create a merged annotation table between Rhesus macaque (M. mulatta) and Human (H. sapiens)

table <- getLDS(
  mart = rhesus,
  attributes = c('ensembl_gene_id', 'vgnc', 'external_gene_name', 'chromosome_name'),
  martL = human,
  attributesL = c('ensembl_gene_id','hgnc_symbol','gene_biotype', 'chromosome_name'))

head(table)
      Gene.stable.ID VGNC.ID Gene.name Chromosome.scaffold.name
1 ENSMMUG00000052610                                          1
2 ENSMMUG00000065379              ND4L                       MT
3 ENSMMUG00000060150           5S_rRNA           QNVO02001753.1
4 ENSMMUG00000065383              ATP8                       MT
5 ENSMMUG00000063555                                          8
6 ENSMMUG00000065366              COX3                       MT
  Gene.stable.ID.1 HGNC.symbol      Gene.type Chromosome.scaffold.name.1
1  ENSG00000271254             protein_coding                 KI270711.1
2  ENSG00000212907     MT-ND4L protein_coding                         MT
3  ENSG00000278457                       rRNA                 KI270442.1
4  ENSG00000228253     MT-ATP8 protein_coding                         MT
5  ENSG00000198695      MT-ND6 protein_coding                         MT
6  ENSG00000198938      MT-CO3 protein_coding                         MT

dim(table)
[1] 23385     8

table[6000:6010,]
         Gene.stable.ID    VGNC.ID    Gene.name Chromosome.scaffold.name
6000 ENSMMUG00000007970                                                3
6001 ENSMMUG00000005091 VGNC:75524        NRBP1                       13
6002 ENSMMUG00000010505 VGNC:82153       GTF3C2                       13
6003 ENSMMUG00000009610 VGNC:81776      TMEM196                        3
6004 ENSMMUG00000061977                  SPIRE1                       18
6005 ENSMMUG00000000452                    LSM7                       19
6006 ENSMMUG00000003962 VGNC:76941         RRM1                       14
6007 ENSMMUG00000014165                   CHADL                       10
6008 ENSMMUG00000009818 VGNC:75393          NPY                        3
6009 ENSMMUG00000026995            mml-mir-320a                        8
6010 ENSMMUG00000003815 VGNC:69540     ADAMTSL5                       19
     Gene.stable.ID.1 HGNC.symbol      Gene.type Chromosome.scaffold.name.1
6000  ENSG00000146574       CCZ1B protein_coding                          7
6001  ENSG00000115216       NRBP1 protein_coding                          2
6002  ENSG00000115207      GTF3C2 protein_coding                          2
6003  ENSG00000173452     TMEM196 protein_coding                          7
6004  ENSG00000134278      SPIRE1 protein_coding                         18
6005  ENSG00000130332        LSM7 protein_coding                         19
6006  ENSG00000167325        RRM1 protein_coding                         11
6007  ENSG00000100399       CHADL protein_coding                         22
6008  ENSG00000122585         NPY protein_coding                          7
6009  ENSG00000208037     MIR320A          miRNA                          8
6010  ENSG00000185761    ADAMTSL5 protein_coding                         19
ADD COMMENT
1
Entering edit mode

Is it possible to retrieve a Human ENST ID <--> Macaca ENST ID <--> Gene output from BioMart? Tables you posted contain gene identifiers. What OP wants is transcript ID's.

ADD REPLY
3
Entering edit mode

hm, seems to be [possible]

table <- getLDS(
  mart = rhesus,
  attributes = c('ensembl_gene_id', 'ensembl_transcript_id',
    'vgnc', 'external_gene_name', 'chromosome_name'),
  martL = human,
  attributesL = c('ensembl_gene_id', 'ensembl_transcript_id',
    'hgnc_symbol','gene_biotype', 'chromosome_name'))

head(table)
      Gene.stable.ID Transcript.stable.ID VGNC.ID Gene.name
1 ENSMMUG00000065354   ENSMMUT00000110390               ND5
2 ENSMMUG00000065353   ENSMMUT00000110412               ND3
3 ENSMMUG00000028672   ENSMMUT00000038257                  
4 ENSMMUG00000053115   ENSMMUT00000097737           5S_rRNA
5 ENSMMUG00000065380   ENSMMUT00000110416              COX2
6 ENSMMUG00000061090   ENSMMUT00000107734           5S_rRNA
  Chromosome.scaffold.name Gene.stable.ID.1 Transcript.stable.ID.1 HGNC.symbol
1                       MT  ENSG00000198786        ENST00000361567      MT-ND5
2                       MT  ENSG00000198840        ENST00000361227      MT-ND3
3                        9  ENSG00000198695        ENST00000361681      MT-ND6
4                        1  ENSG00000278457        ENST00000620265            
5                       MT  ENSG00000198712        ENST00000361739      MT-CO2
6                        1  ENSG00000278457        ENST00000620265            
       Gene.type Chromosome.scaffold.name.1
1 protein_coding                         MT
2 protein_coding                         MT
3 protein_coding                         MT
4           rRNA                 KI270442.1
5 protein_coding                         MT
6           rRNA                 KI270442.1
ADD REPLY
2
Entering edit mode

This would be the answer for specific question asked above.

ADD REPLY
0
Entering edit mode

This seems OK when one gene generates only one transcirpt. But when one gene has multiple transcripts, these code would return a table where one transcript ID of the Macaca genes correspond to all transcript IDs of the homologous human gene. How do I get one-to-one transcript ID mapping?

ADD REPLY
3
Entering edit mode

How do I get one-to-one transcript ID mapping?

I don't think you will get that pre-computed. You will need to perhaps align the transcripts using blast or blat and decide for yourself if you just want one representative.

ADD REPLY
0
Entering edit mode

This might be the most reasonable solution. Thanks a lot.

ADD REPLY
1
Entering edit mode
3.8 years ago
tamerg ▴ 100

An alternative way is using biobtreeR for this mapping

# 1) set a directory
bbUseOutDir("path of a directory")

# 2) build data for interesed genomes with orthologs  
# fetches latest related raw data directly from ensembl and indexed for mapping
bbBuildCustomDB(rawArgs="--tax 9606,9544 --ensembl-orthologs build")

# 3) start biobtreeR
bbStart()

# 4)  perform mappings
bbMapping("ENSMMUT00000110390","map(ensembl).map(ortholog).map(transcript)")

Note that mapping function can get multiple identifiers and once step 2 is performed it can be skipped for later use. This also allows reproducing the same results.

ADD COMMENT

Login before adding your answer.

Traffic: 3249 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6