How to extract subset of protein structure (PDB format) file based on a subsequence of protein
1
1
Entering edit mode
3.5 years ago
gundalav ▴ 380

I looking at a particular protein structure called 2LY4 accessible from RSCB PDB website. The corresponding fasta sequence for that structure is this:

>2LY4_1|Chain A|High mobility group protein B1|Homo sapiens (9606)
GKGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGE
>2LY4_2|Chain B|Cellular tumor antigen p53|Homo sapiens (9606)
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPL

And the PDB format file can be downloaded here.

What I want to do is to extract the subset of PDB format based on the subsequence in fasta above. Namely Chain A starting from 1st residue to 30th residue

GKGDPKKPRGKMSSYAFFVQTCREEHKKKH

How can I do that in R or Python?

protein pdb python r • 1.7k views
ADD COMMENT
0
Entering edit mode
3.5 years ago
jgreener ▴ 390

I would use Biopython. See the tutorial PDF page 192 for information on how to write out part of the structure. In this case you'll want to write an accept_residue function based on the residue number.

ADD COMMENT
0
Entering edit mode

Also see other answers on Bioinformatics Stack Exchange.

ADD REPLY

Login before adding your answer.

Traffic: 2148 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6