Hi, a quick check on NCBI Gene reveals that the official symbol for this is PRXL2C, not AAED1. In this way, I would not have expected org.Hs.eg.db
(using 'recent' annotation) to have it. However, I can see that EnsDb.Hsapiens.v86
(older version) does [have it]. So, there must have been an annotation change in the recent Ensembl versions. Important to remember that gene annotation is constantly changing.
org.Hs.eg.db
library(org.Hs.eg.db)
select(org.Hs.eg.db,
keys = 'AAED1',
column = c('ENSEMBL', 'SYMBOL'),
keytype = 'SYMBOL')
Error in .testForValidKeys(x, keys, keytype, fks) :
None of the keys entered are valid keys for 'SYMBOL'. Please use the keys method to see a listing of valid arguments.
EnsDb.Hsapiens.v86
library(EnsDb.Hsapiens.v86)
select(EnsDb.Hsapiens.v86,
keys = 'AAED1',
column = c('GENEID', 'SYMBOL'),
keytype = 'SYMBOL')
GENEID SYMBOL
1 ENSG00000158122 AAED1
------------
If we instead check for the official symbol, PRXL2C, in org.Hs.eg.db
:
select(org.Hs.eg.db,
keys = 'PRXL2C',
column = c('ENSEMBL', 'SYMBOL'),
keytype = 'SYMBOL')
SYMBOL ENSEMBL
1 PRXL2C ENSG00000158122
----------
In situations like this, one can use limma's alias2SymbolTable()
to help retrieve all aliases for your genes.
limma::alias2SymbolTable('AAED1', species = 'Hs')
[1] "PRXL2C"
This simple example also highlights why it's better to use Ensembl or Entrez gene IDs for analyses.
Kevin
Thanks, Kevin! That worked perfectly. As always, your clear and thoughtful answer is much-appreciated.
Yeah, I definitely think it's easier to work in unique identifiers and usually only convert to symbol for reporting. Then I happen to need this published data and figure I'll just use a GEO supplementary file to "save time"... Ends up taking 10x longer than just re-quantifying their raw SRA data in Salmon...
the
alias2SymbolTable
approach you mentioned would also be useful in the following situation :