Zebrafish Ensembl gene ID to RefSeq protein accession?
1
Hi,
I am trying to convert zebrafish Ensembl gene IDs to RefSeq protein accessions. I did this partially with BioMart, but only got ~12k RefSeq accessions for the ~22k Ensembl IDs I queried. Are there any other tools available that I could use to convert the remaining ~10k Ensembl gene IDs, ideally ones that folks have used for zebrafish? From my google searches I can see that most of these are optimized for rat/mouse/human.
Thanks,
Gabe
BioMart
Ensembl
RefSeq
• 2.4k views
•
link
updated 16 months ago by
Ram
44k
•
written 4.5 years ago by
gpreising
▴
10
You can try the org.Dr.eg.db package - it may match more IDs for you:
library(org.Dr.eg.db)
ens <- c('ENSDARG00000061451', 'ENSDARG00000061749',
'ENSDARG00000061764')
keytypes(org.Dr.eg.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GO" "GOALL" "IPI" "ONTOLOGY" "ONTOLOGYALL"
[16] "PATH" "PFAM" "PMID" "PROSITE" "REFSEQ"
[21] "SYMBOL" "UNIGENE" "UNIPROT" "ZFIN"
.
mapIds(org.Dr.eg.db, keys = ens,
column = c('REFSEQ'), keytype = 'ENSEMBL')
'select()' returned 1:many mapping between keys and columns
ENSDARG00000061451 ENSDARG00000061749 ENSDARG00000061764
"NM_001079967" NA "XM_005173182"
select(org.Dr.eg.db, keys = ens,
columns = c('REFSEQ', 'ENTREZID', 'SYMBOL', 'ENSEMBL'),
keytype = 'ENSEMBL')
'select()' returned 1:many mapping between keys and columns
ENSEMBL REFSEQ ENTREZID SYMBOL
1 ENSDARG00000061451 NM_001079967 558048 n4bp2
2 ENSDARG00000061451 NP_001073436 558048 n4bp2
3 ENSDARG00000061451 XM_021475000 558048 n4bp2
4 ENSDARG00000061451 XP_021330675 558048 n4bp2
5 ENSDARG00000061749 <NA> <NA> <NA>
6 ENSDARG00000061764 XM_005173182 559276 ahnak
7 ENSDARG00000061764 XM_005173183 559276 ahnak
8 ENSDARG00000061764 XM_005173184 559276 ahnak
9 ENSDARG00000061764 XM_005173188 559276 ahnak
10 ENSDARG00000061764 XM_009291066 559276 ahnak
11 ENSDARG00000061764 XM_017359122 559276 ahnak
12 ENSDARG00000061764 XM_021481163 559276 ahnak
13 ENSDARG00000061764 XM_021481164 559276 ahnak
14 ENSDARG00000061764 XP_005173239 559276 ahnak
15 ENSDARG00000061764 XP_005173240 559276 ahnak
16 ENSDARG00000061764 XP_005173241 559276 ahnak
17 ENSDARG00000061764 XP_005173245 559276 ahnak
18 ENSDARG00000061764 XP_009289341 559276 ahnak
19 ENSDARG00000061764 XP_017214611 559276 ahnak
20 ENSDARG00000061764 XP_021336838 559276 ahnak
21 ENSDARG00000061764 XP_021336839 559276 ahnak
Kevin
Login before adding your answer.
Traffic: 2553 users visited in the last hour
Please provide examples of ID's that don't match. Using Entrezdirect may be possible in this case to get the info you want.
Sure, here are a few IDs that I could get gene names for but not RefSeq accressions
ENSDARG00000061451 ENSDARG00000061749 ENSDARG00000061764
Would the following link help? https://david.ncifcrf.gov/conversion.jsp