Entering edit mode
5.2 years ago
entropy
▴
50
This seems to be a basic process but somehow I could not find the best answer yet.
I am trying to convert Uniprot IDs into Gene Symbols. I run this code but about 1/4 of my list returns NA, Is there a better way to get the full conversion right?
uniprots[1:5] . # "A0QVH7" "A0R666" "A0SYQ0" "A3RGC1" "A5A4K8"
length(uniprots) # 64102
z <- select(org.Hs.eg.db, uniprots, "SYMBOL", "UNIPROT")
dim(z) # 64320 2
length( which( is.na( z$SYMBOL ) ) ) # 15789
Verify ones that are not converting using UniProt's ID mapping tool (from
UniProt ID
toGene name
). They may either not be human genes or deprecated in recent releases.Thanks. It looks there exist some. I just tried and got this:"1,370 out of 1,433 identifiers from UniProtKB AC/ID were successfully mapped to 1,092 Gene name IDs.".
For example I got this:
and so on...
Actually, I uploaded 15K IDs but received only that much result, not sure why. I selected FROM: "UniProtKB AC/ID" , TO: "GENE NAME" in the drop down menu.
That is a mouse protein.
This is a cow protein.
You are using human database/library in your R code above. So it predictably is not able to find these ID's.
Thanks. I think that answers my question. Is there a way to scan all at once instead of looping per database?
I think you are asking if this can be done via
R
. Perhaps someone else would suggest an appropriate package.You could also do this by programmatically accessing UniProt's site.