key lookup changes with AnnotationDbi version
2
2
Entering edit mode
4.2 years ago
james ▴ 20

I thought that all of the information about each chipset (i.e. platform) was in the corresponding R package.

For example, hgu219.db is the annotation package for the hgu219 platform.

However, my key lookup results differ depending on the package version of AnnotationDbi, even when the hgu219.db package versions are the same.

So for example,

keys(hgu219.db, keytype  = 'UNIPROT')

gives a different list of UNIPROTs depending on the AnnotationDbi version.

I thought all of the info was in the hgu219.db package. My thinking must be incorrect?

Can someone explain why this is happening?

R Bioconductor • 2.3k views
ADD COMMENT
0
Entering edit mode

Try packageVersion("hgu219.db") to check exact version of "hgu219.db", don't guess.

ADD REPLY
0
Entering edit mode

Yes, of course. This is exactly what I did. On both systems:

> packageVersion('hgu219.db')
[1] ‘3.2.3’
ADD REPLY
0
Entering edit mode

Can you provide the version of R, Bioconductor, hgu219.db and AnnotationDbi packages you are using on each computer/platform?

ADD REPLY
0
Entering edit mode

OK, but as explained in my question, I don't see why anything besides the hgu219.db version is relevant.

Computer 1:

> R.Version()$version.string
[1] "R version 3.6.3 (2020-02-29)"
> library(BiocManager)
Bioconductor version 3.10 (BiocManager 1.30.10), ?BiocManager::install for help
Bioconductor version '3.10' is out-of-date; the current release version '3.11' is available with R version '4.0';     
see https://bioconductor.org/install
> packageVersion('hgu219.db')
[1] ‘3.2.3’
> packageVersion('AnnotationDbi')
[1] ‘1.48.0’

Computer 2:

> R.Version()$version.string
[1] "R version 3.6.2 (2019-12-12)"
> library(BiocManager)
Bioconductor version 3.9 (BiocManager 1.30.4), ?BiocManager::install for help
Bioconductor version '3.9' is out-of-date; the current release version '3.11'
is available with R version '4.0'; see https://bioconductor.org/install
> packageVersion('hgu219.db')
[1] ‘3.2.3’
> packageVersion('AnnotationDbi')
[1] ‘1.46.0’
ADD REPLY
0
Entering edit mode

And what's your result compare of keys? For example setdiff(computer1_keys, computer2_keys)

ADD REPLY
0
Entering edit mode

Sorry, but it sounds like you don't know the answer to my question?

One has a few hundred more UNIPROTs than the other.

ADD REPLY
0
Entering edit mode

Exactly 361 more UNIPROTs with the older version of AnnotationDbi, i.e. the new version gives a subset of the old version.

ADD REPLY
0
Entering edit mode

As @MatthewP observed, the new version of AnnotationDbi is dropping a few hundred UNIPROTs.

Just guessing, but is AnnotationDbi keeping a list of "stale" UNIPROTs?

ADD REPLY
1
Entering edit mode
4.2 years ago
Lluís R. ★ 1.2k

You are using different Bioconductor versions between computers, which affect the underlying data for hgu219.db. For instance the hgu219.db version on computer 2 is newer than the AnnotationDbi version, which might be one of the reason for this, as hgu219.db uses data and methods provided by AnnotationDbi and other packages on Bioconductor for that release.

Check using BiocManager::valid() and follow its advice to set up a valid Bioconductor installation and use the same Bioconductor version on both machines if you want consistent results between both computers.

ADD COMMENT
1
Entering edit mode
4.2 years ago

Based on my limited investigation I'm not quite sure it's AnnotationDbi but more likely the version of the org.Hs.eg.db package you have installed. I think that the lookups are mainly going through org.Hs.eg.db to find matches through the entrezID. You can also just switch hgu219.db with org.Hs.eg.db and then output is the same. I didn't see any actual UNIPROT IDs directly in the hgu219.db package.

It seems that this is in fact a potential serious issue. I guess the reasoning for why UniprotIDs were dropped is unclear (and also which ones). I see you have already made a bioconductor post which is good.

To expand upon this I looked at the setdiff between installations - more additional IDs are lost with newer version (in my case) but not to the magnitude you expressed.

I just use one accession as an example (Q6ZP68) where it completely disappears in the newer version despite being annotated as reviewed in UNIPROT.

Computer 1: hgu219.db_3.2.3, org.Hs.eg.db_3.11.4, AnnotationDbi_1.50.3

> select(hgu219.db, keys=c("Q6ZP68"),columns=c("SYMBOL","GENENAME","ENTREZID"), keytype="UNIPROT")
Error in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'UNIPROT'. Please use the keys method to see a listing of valid arguments.
> select(hgu219.db, keys=c("ATP11AUN"),columns=c("GENENAME","ENTREZID","UNIPROT"), keytype="SYMBOL")
'select()' returned 1:1 mapping between keys and columns
    SYMBOL                 GENENAME ENTREZID UNIPROT
1 ATP11AUN ATP11A upstream neighbor   400165    <NA>

Computer 2: hgu219.db_3.2.3, org.Hs.eg.db_3.8.2, AnnotationDbi_1.46.

> select(hgu219.db, keys=c("Q6ZP68"),columns=c("SYMBOL","GENENAME","ENTREZID"), keytype="UNIPROT")
'select()' returned 1:1 mapping between keys and columns
  UNIPROT   SYMBOL                 GENENAME ENTREZID
1  Q6ZP68 ATP11AUN ATP11A upstream neighbor   400165
> select(hgu219.db, keys=c("ATP11AUN"),columns=c("GENENAME","ENTREZID","UNIPROT"), keytype="SYMBOL")
'select()' returned 1:1 mapping between keys and columns
    SYMBOL                 GENENAME ENTREZID UNIPROT
1 ATP11AUN ATP11A upstream neighbor   400165  Q6ZP68
ADD COMMENT
0
Entering edit mode

Official response is that the responsibility for lack of these annotations falls on NCBI. The R packages just wrap the publicly available data.

https://support.bioconductor.org/p/134782/

ADD REPLY

Login before adding your answer.

Traffic: 1226 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6