Renumber Pdb Files To Match Actual Sequence
2
1
Entering edit mode
12.4 years ago
Whetting ★ 1.6k

Hi,
I am working on a project aimed at compiling papillomavirus sequence information. I will gladly share the link if people are interested, but I do not want to spam. Anyway, as part of the effort we want to show alignments between pdb structure files and HPV sequences.
We noticed that several PDB files were not numbered according to the actual genome. E.g. assume the C-terminal domain of protein x was crystallized, the numbering should be residue 250 to residue 500, however, the crystallographer numbered the PDB file according to the peptide crystallized. Does anyone have any suggestions for a program that may be able to accomplish the renumbering? Thanks!

EDIT: I think I may have found a solution.
I think I can write a tool pdbsws using and a perl file I found here: http://www.canoz.com/sdh/renumberpdbchain.pl

pdb sequence • 7.0k views
ADD COMMENT
2
Entering edit mode
12.4 years ago

Sometimes PDB numbering is quite a mess. I used protein alignment but it's useless in term of full PDB database. Take a look at the service pdbsws

ADD COMMENT
0
Entering edit mode

That's pretty cool, wish I had known about that one earlier!

ADD REPLY
1
Entering edit mode
12.4 years ago
Will 4.6k

I've come across the same problem. My method has been to align (using a local alignment) the PDB sequences with the relevant protein sequences and determine the proper numbering from there. I wrote a simple Matlab script to do the re-numbering but any language should work just as well.

Also, don't forget to account for gaps in the PDB sequences. I've found many instances where the crystal structure is missing parts in the middle.

ADD COMMENT
0
Entering edit mode

Hi Will, the problem I ran into was that it seemed impossible to completely renumber the entire pdb file. I.e. helices, sheets,...have to be renumbered as well. Did you write a script that updated all those lines, or is that not necessary to parse the pdb file?

ADD REPLY
0
Entering edit mode

Essentially I just use the script to write out the position (X,Y,Z), chain, original-index, and full-protein-index of each AA to a separate file. Then I just used those for my downstream analysis ... I didn't try to write anything back into the PDB file.

ADD REPLY

Login before adding your answer.

Traffic: 1670 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6