I want to get the accessible surface area for each residue of a protein structure. I need to do that for biological assembly and remove ligands. I hope this could cover all proteins in UniProt. However, I don't have an efficient way to remove ligands. I have data from SIFTS, which maps PDB residue to UniProt residue. What I am thinking is:
- I calculate ASA/RSA for all PDBs(Biological Assembly)
- I take the maximum ASA for all corresponding residues, and assign the maximum ASA to the residue on UniProt
Is this a right way to do it? Because, here I assume, I will always find a PDB without any ligand for each protein in UniProt.
I am thinking to use DSSP or NACCESS. I hope these two tools can deal with water and other molecule correctly.
Thanks
The problem is that HETATM can be part of protein. Because, modified residues will also be assigned as HETATM. ION is HETATM which may be part of protein also. And, atoms which are not HETATM may not be part of the protein, like binding peptides.
Makes sense. If you don't know how the composition of the ligands or have PDBs with modified residues or the ligands are part of protein chains, then I'm not sure if I can think of an easy way. What you propose, reconstructing the protein ASA sans the ligand, may work. You can do it the other way round, filter out the ligands based on the ligand database search, but I suppose there's a risk of filtering legitimate parts of the protein. Neither is "easy" computation-wise.