Why there are multiple versions of a protein sequence in PDB FASTA downloaded file?
2
0
Entering edit mode
9.5 years ago

Sorry if the question is banal, I search but didn't find an explanation.Recently I downloaded 4HHB Protein sequence from PDB but actually the file contains four sequences with names 4HHB:A, 4HHB:B, 4HHB:C, 4HHB:D. It is like the below:

>4HHB:A|PDBID|CHAIN|SEQUENCE
A sequence....
>4HHB:B|PDBID|CHAIN|SEQUENCE
A sequence...
>4HHB:C|PDBID|CHAIN|SEQUENCE
A sequence...
>4HHB:D|PDBID|CHAIN|SEQUENCE
A sequence...

I would be appreciate if provide me with a resource or explain me what are those four sequences exactly and what is the difference between them?

PDB Protein sequence • 2.9k views
ADD COMMENT
1
Entering edit mode
9.5 years ago
GenoMax 147k

Data in PDB is represented in asymmetric units which are put together to make the biological assemblies. There is a nice tutorial here: http://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/biological-assemblies

In the protein you are referring to there are a total of 4 chains (so the 4 sequences) but they represent 2 sequence unique identities (A/C are identical, as are B/D). http://www.rcsb.org/pdb/explore/remediatedSequence.do?params.chainEntityStrategyStr=first&structureId=4HHB

ADD COMMENT
0
Entering edit mode

Thanks a lot.

ADD REPLY
1
Entering edit mode
9.5 years ago
roy.granit ▴ 890

Each of these represent a different peptide chain that together compose the entire protein. e.g 4HHB:A = chain #1. This protein has a total of four different chains

ADD COMMENT
0
Entering edit mode

Thank you a lot roy.

ADD REPLY

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6