Can I map gene symbols to PDB ids?
4
2
Entering edit mode
8.5 years ago
haohanw ▴ 90

I have a list of around 8,000 gene symbols, (like NAT2, ADA, CDH2 etc.) Can I map them to PDB (protein databank) names?

I have tried David, but they seems not have the choice of PDB. I also tried bioDBnet, but they give me just '-' for every gene symbol I input.

Then, from this site, it seems there is rarely any mapping information from gene symbol to PDB, so can I even do that? Do we believe there is a (roughly) one to one mapping relation between gene and protein?

If there is and I can, which site is a good choice?

gene protein symbol id • 8.2k views
ADD COMMENT
1
Entering edit mode
8.5 years ago

Yes, you can remap a PDB entry to a gene.

Remember:

enter image description here

  • Gene > transcription>Transcript >translation> Protein
  • Protein data is rendered as sequence (primary structure), predicted structure (secondary structure) and 3D structure using X-ray/NMR crystallography (these data sets are in PDB, but it's a central repository of structural data)
  • PDB files are structural data of proteins
  • The blueprint is your genes, but the transcript has to goes through processing steps before protein synthesis.
  • Splicing, exon structure, RNA editing and other post-translational modifications are critical steps, etc.
  • You will have a representative protein structure. Don't expect the mapping to be a direct 3:1 (codon: amino acid ratio).

RCSB (the site you referred in your answer) or Uniprot ID mapping is the best way to do this mapping.

PS. RTWP on Protein Structure

ADD COMMENT
0
Entering edit mode

Thanks. It seems that uniprot id mapping can only map it to unprot id though. And it seems that the RSCB shows that only a few gene symbol has PDB (which confuses me the most) Could you have a look at the site?

ADD REPLY
0
Entering edit mode

Not all genes have corresponding protein structure data. This is a known problem - because some proteins are easy to characterize but some aren't.

Can you share your gene list?

ADD REPLY
0
Entering edit mode

Thanks. It seems that in my situation, most genes do not have corresponding proteins. Here is my list. Please have a look.

ADD REPLY
0
Entering edit mode

I've queried your list (around 7.4 k genes in the latest human assembly) using BioMart and it seems 2,696 have got PDB IDs. C21orf62 for example does not match to any PDB IDs. It may be worth bringing some of these examples to PDB's attention by contacting them directly.

ADD REPLY
1
Entering edit mode
8.5 years ago
ahmedakhokhar ▴ 150

Hey, UniProtID mapping another way to go. http://www.uniprot.org/uploadlists/

DAVID can also be used for this purpose. https://david.ncifcrf.gov/summary.jsp

ADD COMMENT
0
Entering edit mode

Thanks. I don't think they work. David seems not to have an option for PDB, and uniprot seems to offer to everything to uniprotID.

ADD REPLY
0
Entering edit mode

Dear, just copy and paste the following link to your internet browser, it will give you all the PDB ids corresponds to human proteins from where you can map to your list of ids.

http://www.uniprot.org/uniprot/?query=human&fil=organism%3A%22Homo+sapiens+%28Human%29+%5B9606%5D%22&sort=score&format=tab&columns=id,genes(OLN),%2Cgenes(PREFERRED),%2Cdatabase(PDB)

ADD REPLY
0
Entering edit mode

UniProtID mapping works for me, however it would be nice to be able to do that programmatically via the SPARQL query

ADD REPLY
1
Entering edit mode
8.5 years ago

Just to complement ahmedakhokhar's useful answer:

You can upload your list of genes to the UniProt ID mapping (http://www.uniprot.org/uploadlists) and map them to UniProtKB. Once you have your result, you can add additional criteria, e.g. "AND database:(type:pdb)" and customize your output by adding or removing columns (cf http://www.uniprot.org/help/customize or

Then click on "Download" and select tab-delimited format. You may use the "preview first 10" option to double-check what your download will look like before you proceed to the full download.

ADD COMMENT
0
Entering edit mode
8.5 years ago
Denise CS ★ 5.2k

Yes, you can map gene symbols to PDB IDs. Try Biomart. If you know R, use biomaRt in bioconductor or try to convert HGNC ID to PDBe IDs using the web interface of BioMart.

ADD COMMENT
0
Entering edit mode

Is there a good tutorial for this?

ADD REPLY
0
Entering edit mode

Ensembl tutorials. BioMart tutorial included.

ADD REPLY

Login before adding your answer.

Traffic: 1629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6