What is the command in PyMOL for listing all the amino acids in a specific protein, say, 1a62.pdb?
What is the command in PyMOL for listing all the amino acids in a specific protein, say, 1a62.pdb?
get_fastastr()
. See https://pymolwiki.org/index.php/Get_fastastr:
PyMOL>fetch 1a62
TITLE CRYSTAL STRUCTURE OF THE RNA-BINDING DOMAIN OF THE TRANSCRIPTIONAL TERMINATOR PROTEIN RHO
ExecutiveLoad-Detail: Detected mmCIF
CmdLoad: "./1a62.cif" loaded as "1a62".
PyMOL>print(cmd.get_fastastr('all'))
>1a62_A
?NLTELKNTPVSELITLGEN?GLENLAR?RKQDIIFAILKQHAKSGEDIFGDGVLEILQDGFGFLRSADS
SYLAGPDDIYVSPSQIRRFNLRTGDTISGKIRPPKEGERYFALLKVNEVNFDKPENARNK
If you do want an actual list of the amino acids represented in the structure while in PyMOL, or via accessing its API, an alternative to get_fastastr()
pointed out by Julian, is to iterate on the residues using PyMOL's iterate command. This has the added ability in that the iterate command exposes additional variables you can access while iterating to get additional details about individual residues. The additional variables are listed here. For example, to iterate over each residue in chain A and get a list of the amino acid, residue number, and the type of secondary structure the residue occurs in, use:
secondary_structure_list_by_aa_resnumber = []
iterate (chain A and name ca), secondary_structure_list_by_aa_resnumber.append((oneletter,resv,ss))
print (secondary_structure_list_by_aa_resnumber)
Iterating via PyMOL's API is demonstrated here. You can get an active form of that Jupyter notebook in a temporary session served via MyBinder.org by going here, clicking the launch binder
badge, and then selecting the notebook entitled 'Demo of Iterating over residue secondary structure' from the list of available notebooks after the session launches.
A couple of things to bear in mind:
Display
> Sequence
.PDBsum will provide this information for every structure in the Protein Data Bank and at the same time make it much clearer what is or isn't represented in both the sequence & structure context.
From the main PDBsum page of your example 1a62, you can see that although the E. coli transcription termination factor rho is 419 amino acids, only 125 amino acids at the N-terminus is represented in this structure. The 'Protein' tab will provide details of the specific residues represented. If there was gaps caused by unrepresented residues internal to the chain, the gaps would show in this view by a break in the secondary structure representation and an absence of the letters for the amino acids in that region. See the 'Protein' tab of 2ace here for an example where 485-489 are missing. See here for more about such gaps. When you open a structure in the viewer FirstGlance in Jmol, it immediately highlights regions with missing with mesh baskets as they are often easy to miss when scanning the 3D structure initially. You can click on 'Missing Residues' for a report on them. For example, the model 2ace is missing 10 residues of the protein: 1-3, 485-489, 536-537 in chain A and that includes 3 negatively charged amino acids.
While on the 'Protein' tab for a chain at PDBsum, the FASTA file for the sequence represented in structure model for that chain can be obtained by clicking on the file icon to the right of the top line of the secondary structure, just to the left of the wire diagram of the topology. The URL of that page can be used to parse out the pattern to submitting and retrieving this information computationally using wget
or curl
on the command line without using a browser.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Is there any way to see the 3-letter names rather than 1-letter symbols?
You can use a web tool to convert the sequence, such as One to three. Or inside PyMOL, use this code for your chain if it is chain A:
Or if you prefer no spaces and only the first letter capitlized, like One to three yields, use this in PyMOL:
Those are based on my answer here about iterating.
Change the chain designation to match the one of interest to you if it isn't chain
A
.I am not working on the web. I am working on a terminal and running Python scripts.
I also linked in my answer an example how you can use the PyMOL iterate code inside a Python script. With that as a guide you could adapt the code I supplied getting
resn
for each residue to the PyMOL API.Other alternatives are provided the pymol wiki aa page that would allow taking what is returned by
get_fastastr
and converting. An example based on that using a Python dictionary:That gives from the original example with
1a62_A
where the FASTA was saved astest.fa
: