Entering edit mode
3.5 years ago
gundalav
▴
380
I looking at a particular protein structure called 2LY4 accessible from RSCB PDB website. The corresponding fasta sequence for that structure is this:
>2LY4_1|Chain A|High mobility group protein B1|Homo sapiens (9606)
GKGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGE
>2LY4_2|Chain B|Cellular tumor antigen p53|Homo sapiens (9606)
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPL
And the PDB format file can be downloaded here.
What I want to do is to extract the subset of PDB format based on the subsequence in fasta above.
Namely Chain A
starting from 1st residue to 30th residue
GKGDPKKPRGKMSSYAFFVQTCREEHKKKH
How can I do that in R or Python?
Also see other answers on Bioinformatics Stack Exchange.