Hi everyone,
Via targeted resequencing, I have identified missense mutations in my gene of interest. These missense mutations are enriched in patients versus control individuals. I made a lolliop plot to visualise where these mutations are located on the protein and to see whether they occur in protein domains. I now want to know wheter certain protein domains are enriched for these mutations, or if these mutations tend to cluster in any other region of the protein.
What is the correct way to achieve this? Are there any papers or tools that I can look into?
Thanks a lot for your suggestions.
Yes, I already used Pfam to retrieve the domain organization of my protein of interest and mapped the mutations onto my protein, so I know where they are located. My question exactly is how to bioinformatically/statistically test clustering to present the results in a more scientific way instead of by visual confirmation.
I could find a 3D model via Swiss-Model Repository. Can I use a tool or test other than to visually check clustering?
Thank you very much!
If you have a structure (or a model) of your protein, coloring mutated residues differently from the rest should help you visualize whether they are in close spatial proximity. PyMol can do that easily - an extensive tutorial is here, and you will specifically need Selection commands. If you want to quantify this beyond visualization, PyMol also can measure distances between residues. You can safely assume that residues closer than 8-10 angstroms in space are part of the same "patch" within a molecule, and it may be appropriate to use even larger distance.
I will look into it, thank you very much!
A word of caution about visual confirmation is that mutations can preferentially occur at certain nucleotide sequence contexts (e.g. CpG) which are not necessarily evenly distributed across the CDS of a protein.