I am essentially looking for gene to protein mapping/lookup table. I have gene names (in biomart, it's called external_gene_name
to be specific). Looks something like this:
"PLPPR3" "TSPY10" "ELANE" "DENND11" "TSPY4" "CCL15" "PSMB3" "TAS2R46" "ZBTB9" "KIR2DL5A" "RWDD2B"
I want to know the proteins produced by these genes. The protein IDs should look like this (cell surface markers for example):
"CD45RA" "CD27" "CD16" "GPR56" "CD56" "CD57" "CD94" "CD158"
I am not sure what is the official terminology for these IDs. uniprot? swissprot? something else? Does anyone know where to find mappings for these gene names to protein ids? If they are on biomart, perhaps someone knows the name of the field? Thanks!
Update: This is the R code that I use to fetch the data:
library(biomaRt)
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset(mart=mart,dataset="hsapiens_gene_ensembl")
pdata <- getBM(mart=mart,attributes=c("external_gene_name","protein_id"),filters=c("biotype"),values=list("protein_coding"),useCache=FALSE)
external_gene_name protein_id
1
2 ABK41909
3 BAF82881
4 BAG36999
5 EAX09448
6 TMPRSS15 AAC50138
7 TMPRSS15 CAB65555
8 TMPRSS15 CAB90389
9 TMPRSS15 CAB90392
10 TMPRSS15 AAI11750
11 TMPRSS15
12 SMIM34B
13 GATD3B
14 GATD3B BAA20888
...
The attribute protein_id
returns some strange looking protein IDs. It should be replaced with something else. Not sure what to use there.
That's an INSDC protein ID. See http://www.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000277117;r=21:5022493-5040666;t=ENST00000623960