Based on my limited investigation I'm not quite sure it's AnnotationDbi
but more likely the version of the org.Hs.eg.db
package you have installed. I think that the lookups are mainly going through org.Hs.eg.db to find matches through the entrezID. You can also just switch hgu219.db with org.Hs.eg.db and then output is the same. I didn't see any actual UNIPROT IDs directly in the hgu219.db package.
It seems that this is in fact a potential serious issue. I guess the reasoning for why UniprotIDs were dropped is unclear (and also which ones). I see you have already made a bioconductor post which is good.
To expand upon this I looked at the setdiff between installations - more additional IDs are lost with newer version (in my case) but not to the magnitude you expressed.
I just use one accession as an example (Q6ZP68) where it completely disappears in the newer version despite being annotated as reviewed in UNIPROT.
Computer 1: hgu219.db_3.2.3, org.Hs.eg.db_3.11.4, AnnotationDbi_1.50.3
> select(hgu219.db, keys=c("Q6ZP68"),columns=c("SYMBOL","GENENAME","ENTREZID"), keytype="UNIPROT")
Error in .testForValidKeys(x, keys, keytype, fks) :
None of the keys entered are valid keys for 'UNIPROT'. Please use the keys method to see a listing of valid arguments.
> select(hgu219.db, keys=c("ATP11AUN"),columns=c("GENENAME","ENTREZID","UNIPROT"), keytype="SYMBOL")
'select()' returned 1:1 mapping between keys and columns
SYMBOL GENENAME ENTREZID UNIPROT
1 ATP11AUN ATP11A upstream neighbor 400165 <NA>
Computer 2: hgu219.db_3.2.3, org.Hs.eg.db_3.8.2, AnnotationDbi_1.46.
> select(hgu219.db, keys=c("Q6ZP68"),columns=c("SYMBOL","GENENAME","ENTREZID"), keytype="UNIPROT")
'select()' returned 1:1 mapping between keys and columns
UNIPROT SYMBOL GENENAME ENTREZID
1 Q6ZP68 ATP11AUN ATP11A upstream neighbor 400165
> select(hgu219.db, keys=c("ATP11AUN"),columns=c("GENENAME","ENTREZID","UNIPROT"), keytype="SYMBOL")
'select()' returned 1:1 mapping between keys and columns
SYMBOL GENENAME ENTREZID UNIPROT
1 ATP11AUN ATP11A upstream neighbor 400165 Q6ZP68
Try
packageVersion("hgu219.db")
to check exact version of "hgu219.db", don't guess.Yes, of course. This is exactly what I did. On both systems:
Can you provide the version of R, Bioconductor, hgu219.db and AnnotationDbi packages you are using on each computer/platform?
OK, but as explained in my question, I don't see why anything besides the hgu219.db version is relevant.
Computer 1:
Computer 2:
And what's your result compare of
keys
? For examplesetdiff(computer1_keys, computer2_keys)
Sorry, but it sounds like you don't know the answer to my question?
One has a few hundred more UNIPROTs than the other.
Exactly 361 more UNIPROTs with the older version of AnnotationDbi, i.e. the new version gives a subset of the old version.
As @MatthewP observed, the new version of AnnotationDbi is dropping a few hundred UNIPROTs.
Just guessing, but is AnnotationDbi keeping a list of "stale" UNIPROTs?