I am using a custom method to predict HELIX/SHEETs for proteins and want to compare my results with what is stored in the PDB file.
For this I compare the coordinates of both methods. The problem I run into is that PDB files frequently start with a coordinate != 1 (e.g. 3LX3.pdb with 669) while my data always starts with 1.
Is there an easy way to get the offset between the two files ? I was thinking of doing a alignment of both sequences but was wondering if there is an easier way ?
ADD COMMENT
• link
updated 13.4 years ago by
Faezeh
▴
10
•
written 13.9 years ago by
Hucks
▴
20
1
Entering edit mode
Even worse, there are PDB files with gaps in the sequence. In one case, the numbering went backward for one part of the chain. The reason for this apparent insanity is that sometimes crystallographers number residues based on a reference sequence. I would recommend searching for the answer to this question on the pdb mailing list, where I've seen it asked before.
I have some uniprotKB codes that have PDB files.when i search one code in uniprot for ex :P32911
i find that has 2 PDB structure and has 127 lenght but when i get it PDB file i see it has more than 127 amino acids,why?
This is a typical problem that I had during the development of structural bioinformatics based tools. I have tried two approaches before:
Re-number the PDB files in a coordinate system that map the residue x from PDB file to 1 and match with your data.
Keep PDB files as such but modify the program to read residue numbers from the PDB file and use them in the program.
I found the second approach more effective because I was extending the tools as web apps that need to display residue numbers to the user and showing coordinate starting from 1 is misleading to the users interested in specific residues.
Sorry for not posting a comment Khader but I haven't made an account yet and it appears that I can only comment on specific replies when I am logged in.
In any case, what I most probably forgot to mention was that there might be subtle AA sequence differences between the two files I am comparing. That's what makes obtaining a mapping between the original pdb file and my data.
Did you just align both sequences to obtain an "offset" so you can map the coordinates ?
Just keep an eye one your karma points... when they pass a certain threshold, you gain new powers. See: https://www.biostars.org/info/faq/. For comments, you need 50 points. That actually does explain why so many new people use the answer field to comment on things.
Even worse, there are PDB files with gaps in the sequence. In one case, the numbering went backward for one part of the chain. The reason for this apparent insanity is that sometimes crystallographers number residues based on a reference sequence. I would recommend searching for the answer to this question on the pdb mailing list, where I've seen it asked before.
Hi to all
I have some uniprotKB codes that have PDB files.when i search one code in uniprot for ex :P32911 i find that has 2 PDB structure and has 127 lenght but when i get it PDB file i see it has more than 127 amino acids,why?
please click "Ask question" to open a new question