Question

Color Pdb According To Amino-Acid Conservation

7

Entering edit mode

14.0 years ago

Kartik Sunagar ▴ 300

Hi,

Can you guys suggest to me any tool that can color the pdb file based on the amino-acid conservation using the MSA? I am aware of websites like Consurf that calculate the evolutionary conservation of amino-acids. But I am looking for a simple tool that could highlight only those regions that are conserved (and if possible, those that vary a lot as well) in the 3D-structure of the protein. I am not a bioinformatician and hence have no programming/scripting knowledge. Manually doing this in pymol and similar softwares is painful...

Thank you, Kartik

amino-acids conservation pdb • 22k views

ADD COMMENT • link updated 11.0 years ago by Jose Manuel Duarte ▴ 340 • written 14.0 years ago by Kartik Sunagar ▴ 300

score 7 · Answer 1 · 2011-08-02

7

Entering edit mode

14.0 years ago

Simon Cockell 7.4k

You can do this using Chimera.

Load the structure you are interested in
Select 'Tools -> Sequence -> MultAlign Viewer'
Choose the MSA you want to associate with the structure
In the MultAlignViewer window that should now be visible, select 'Structure -> Associations...' and associate the correct chain in the structure with the correct sequence in the MSA
Now select 'Structure -> Render by conservation' in MultAlignViewer, and select OK in the dialog box (default options should be fine for a first pass)
Hey presto!

Hemoglobin alpha conservation

ADD COMMENT • link 14.0 years ago by Simon Cockell 7.4k

0

Entering edit mode

Awesome! Thanks a lot.. I've been using Pymol mostly cause I love the interface and images it generates. I hope I find out something similar in pymol too.

ADD REPLY • link 14.0 years ago by Kartik Sunagar ▴ 300

score 4 · Answer 2 · 2011-08-02

4

Entering edit mode

14.0 years ago

Michael Schubert ★ 7.1k

You can also use the ConSurf Server to generate a PDB file with conservation scores written to the b-factors and then e.g. use PyMOL to apply a color gradient according to them.

I actually wrote a plugin for PyMOL to use the NCBI URLAPI to retrieve sequences, align them, and then use different algorithms to calculate and automatically colour residues according to their conservation scores.

If you're interested, I can ask my supervisor if I'm allowed to give it away.

ADD COMMENT • link 14.0 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

Similarly you can visualize in a tool like VMD and colour according to the B-factor as well, if you first use ConSurf to map conservations scores to the b-factor field.

ADD REPLY • link 14.0 years ago by DG 7.3k

0

Entering edit mode

Hey, that will be awesome! Please do send it if you are allowed (anaturalist@gmail.com).

Consurf is a handy tool and I have used it but for some reason I don't find it satisfying. Many times I have noticed that even highly conserved regions are depicted as varying. The author says it could be because of the fact that I was working with very few sequences (~15); but the thing is I often work with genes for which very few sequences are available in the public databases. Also, I it strange that it depicts AAs as varying even when all but two sequences are conserved in the MSA.

ADD REPLY • link 14.0 years ago by Kartik Sunagar ▴ 300

0

Entering edit mode

Thank you very much for the reply Michael and Dan.

Cheers, Kartik

ADD REPLY • link 14.0 years ago by Kartik Sunagar ▴ 300

0

Entering edit mode

Hi Michael, I have just found this post and would be very grateful if you could pass on the plugin to me (email d.watterson8@gmail.com) as well. I would also to know is these an easy way to perform the last part (ie residue colouring according to conservation) part of your script as i already have a MSA file from a non NCBI datablase.Many thanks.

ADD REPLY • link 13.2 years ago by dwatterson8 • 0

score 2 · Answer 3 · 2011-08-03

Hello, I use PFAAT for this task and am very pleased with the results. http://pfaat.sourceforge.net/. PFAAT is a multiple sequence alignment (MSA) viewer that is integrated with Jmol to view structures. This gives you the ability to edit the alignment and quickly re-map the conservation scores.

1) Run PFAAT and open up a MSA with File-> Open.

2) Right click the sequence you have a PDB file for and select "Associate PDB Numbering With Structure" Select the chain you are interested in.

3) Calculate the conservation scores with Analysis -> Conservation Scores -> Information Score. There are a couple of options here, Shannon Entropy with the Blossum62 similarity matrix works well.

4) View the mapped scores in jmol with Analysis -> Map conservation scores to structure.

To change from wireframe representation to spacefill enter the following on the Jmol command prompt, select all; wireframe off; spacefill; If the pdb has waters or non-protein atoms obscuring your view, remove them with select hetero; spacefill off; Get rid of an extra chains with select:X; spacefill off; where X is the chain letter to remove. For more information, refer to the Jmol documentation.

5) To edit the MSA, right click a sequence name and select "Delete sequence" from the menu then repeat steps 3 & 4.

6)If you would like to increase sequence conservation, View -> Sort Sequence By -> Percent Identity of Sequence. Select the sequence with the associated pdb, then delete sequences from the bottom of the list.

If you have any questions, leave a comment and I'll get back to you tomorrow.

score 0 · Answer 4 · 2014-08-27

I've just come across this post and thought I would give another solution. We provide such an amino-acid conservation coloring in our web server http://www.eppic-web.org. We provide precomputed color-coded PyMOL pse (session files) for the whole of the PDB. The proteins are represented already as surfaces, which we find useful to analyse the patterns of conservation at the surface.

To get them you can use URLs like:

http://www.eppic-web.org/ewui/ewui/fileDownload?type=entropiespse&id=<PDB code>&alignment=<Chain ID>

Or otherwise you can access them by clicking on the provided link in the homologs information panel (see the help page).

The MSA is of closely-related sequence homologs only (within 50% sequence identity to the structure) and the coloring is based on sequence entropy values using 10 classes of aminoacids. The actual values are encoded into the b-factor columns in the PDB files used to generate the PyMOL sessions. Aminoacids colored in red are those that could not be aligned to a reference UniProt sequence (engineered tags and similars). Note that due to the strict identity cutoff that we use, it can happen that the MSA contains very few sequence homologs (fewer than 10 homologs for around 20% of chains in the PDB at the moment), so you should check if there are enough homologs for your particular structure in the corresponding EPPIC results page.