[This question has been cross-posted in the bioconductor forum. I am reposting due to little feedback]
I am trying to do a "classical" match of uniprot ids, using protein IDs identified in a Zebrafish mass-spec experiment, to find the corresponding ensembl gene ids. However, there are several proteins for which my biomaRt query fails to retrieve any information, although they are present in the Uniprot database and with an attributed ensembl gene id. Some uniprot ids (e.g. F1QCB4) belong to deleted entries in Uniprot, but this is not the case for all.
Am I missing something?
Here is my code:
prot_ids = c("F1QCB4", "F1R8H7", "A0JMF6", "F1QU18", "A0JMK7", "A0MTA1")
uniProt <- useMart("unimart", dataset="uniprot")
getBM(
attributes =c("accession" ,"name","ensembl_id", "gene_name"),
filter="accession",
values=prot_ids,
mart=uniProt)
[1] accession name ensembl_id gene_name
<0 rows> (or 0-length row.names)
sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] biomaRt_2.22.0 VennDiagram_1.6.9 RColorBrewer_1.1-2
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.28.1 Biobase_2.26.0 BiocGenerics_0.12.1
[4] bitops_1.0-6 DBI_0.3.1 GenomeInfoDb_1.2.3
[7] IRanges_2.0.0 parallel_3.1.2 RCurl_1.95-4.5
[10] RSQLite_1.0.0 S4Vectors_0.4.0 stats4_3.1.2
[13] tcltk_3.1.2 tools_3.1.2 XML_3.98-1.1
Thank you. It is interesting that connecting to either uniprot, as I did, or to ensembl, in your example, has different results, ensembl being seemingly better than that of uniprot.