List Of Sequence Unique Pdb Chains
4
1
Entering edit mode
13.2 years ago
Jonasr ▴ 120

Hi everyone,

I need a list of which PDB chains are sequence-unique.

For example. In PDB entry 2r9r there are 4 chains. But some have equal SEQRES sequences. See here. It says:

The structure 2R9R has in total 4 chains. These are represented by 2 sequence-unique entities.

The chains are A,B,G,H where A is equal to G and B equal to H. I want the following sets:
2r9r: (A,G), (B,H)

The naive approach would be downloading the FASTAs and doing the check myself. This file might be a suitable source:

ftp://ftp.wwpdb.org/pub/pdb/deriveddata/pdbseqres.txt

I'm wondering though, whether such a list does not exist already? I'm also open for suggestions on how to better built it oneself.

*Edit: To clarify again: I can compile this myself, the main point is more whether the list is not available already. That would be more convenient than iterating the full sequence data every time something changes.

Thanks,
Jonas

pdb • 4.2k views
ADD COMMENT
0
Entering edit mode
13.2 years ago

I'm not sure I understand your problem and what is the link to the file. Do you just need the following command line ?:

curl -s "ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt" |\
tr "\n>" "\t\n"  |\
sort -t '     ' -k2,2 -u
ADD COMMENT
0
Entering edit mode

Hi, not exactly. I tried to clarify in the question. What I would expect your command line to return is basically 2r9r: A, B. Interestingly it only returns 2r9r_B, maybe I don't completely understand it. Anyway what I need is for every chain id which other chain ids it is equal to. I hope that makes it clearer?

ADD REPLY
0
Entering edit mode
13.2 years ago
Chris • 0

CD-HIT might help you. It essentially clusters sequences based on a given sequence identity threshold.

ADD COMMENT
1
Entering edit mode

To get what I want I would have to run it at 100% identity. Then I could equally well just search for perfect matches in any scripting language of choice.

ADD REPLY
0
Entering edit mode

True. So? I'm not quite sure I understood why you want a pre-compiled set when you simply could achieve what you want by a simple script/CD-HIT.

ADD REPLY
0
Entering edit mode

You're right it's no big deal. I just would have preferred it to downloading the full set of FASTA sequences in the PDB every time. As they give that information on their website it seemed reasonable to me that it is stored somewhere, but that does not seem to be the case.

ADD REPLY
0
Entering edit mode
13.2 years ago

Have you tried ASTRAL - SCOP Original style datasets ?

ADD COMMENT
0
Entering edit mode
13.2 years ago
Neilfws 49k

See the NCBI Non-redundant PDB chain set.

A summary data file is available from the FTP site. Note: check the dates of the files; the latest is called nrpdb.090711.

The file is plain text with 26 columns. Entries for your PDB example look like:

# 1  2       3     4    5 6     7    8 9     A    B C     D    E F    G      H      I      J     K     L   M   N     O P Q
2R9R B   60830   167   43 0   446   29 0   474    3 0 30897    1 1   0.00  27.82  24.90   3.89  2.40   4  27   3   514 X a
2R9R H   60830   167   44 0   446   30 0   474    4 0 30897    2 0   0.00  29.97  29.38   0.83  2.40   4  27   3   514 X a
2R9R A   60830   200    6 0  2531    4 0  2379    4 0 21166    1 1   0.00   2.10   2.10   0.00  2.40   4  27   3   333 X a
2R9R G   60830   200    7 0  2531    5 0  2379    5 0 21166    2 0   0.00   2.10   2.10   0.00  2.40   4  27   3   333 X a

where columns 4, 7, A and D are Group IDs (at various clustering levels), showing that chains (B, H) and (A, G) are grouped.

However, as others have stated, it really is very quick and easy to generate this kind of data from a file of PDB chain sequences and a tool such as CD-HIT.

ADD COMMENT
0
Entering edit mode

This seems to be the closest solution. In the end the PDB itself does not seem to export this information. Thanks for pointing out the date of the files. The best solution for up to date information should indeed be compiling it myself.

ADD REPLY

Login before adding your answer.

Traffic: 2543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6