Question

Tools to analyze protein sequence if mutation occur

1

Entering edit mode

7.8 years ago

bharata1803 ▴ 560

[Edit : add clearer explanation] Hello,

So, I have cancer-normal RNA-seq and exome-seq data I downloaded from NCBI GEO (human). I have processed the data for getting gene expression level from the RNA-seq data set. Now, I want to do something else. What I want to do is so variant calling, either SNP or indel from the data set. I ahve done this before but now, I want to cross checking the mutation that occur in the genome to the protein sequence. I want to see if there are any changes in amino acid sequences derived from the gene that has mutation.

I want to ask for a tool that can help me to detect mutation (SNP, deletion, and/or insertion) and "translate" the aligned DNA sequence to amino acid sequence. From there, I want to check whether any change occur in the amino acid sequence by comparing to the protein sequence reference I get from PDB or Uniprot. I remember there is a blast tool that can compare nucleotide sequence to protein sequence but I want to include some mutation data. Thank you for your suggestion.

sequence alignment • 4.0k views

ADD COMMENT • link updated 7.8 years ago by Khader Shameer 18k • written 7.8 years ago by bharata1803 ▴ 560

0

Entering edit mode

Could you please describe what is the data you want to analyze, what are the species you study, what do you want to get as the result of the analysis and whether you are interested in homologous sequences in PDB and UniProt.

ADD REPLY • link 7.8 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

Hello, I have edited my question. As for homologous sequences, I don't understand what you mean. Can you explain it further?

ADD REPLY • link 7.8 years ago by bharata1803 ▴ 560

0

Entering edit mode

When you compare SNPs to the reference you get from PDB or Uniprot there are at least two options: 1) collect data for the species you study (namely human in your case), or 2) find all homologous sequences that are good for analysis. The way you analyze mutations and tools needed are different for these two approaches. Also, tools are different for analyzing just a few mutations thoroughly versus many mutations in brief. It looks like you are studying human data and have a lot of SNPs.

For human clinical SNP analysis, some approaches are more common and more broadly accepted. For example, ACMG guidlines https://www.acmg.net/docs/Standards_Guidelines_for_the_Interpretation_of_Sequence_Variants.pdf are de facto standard in the US. One of the very important parts of ACMG guidelines is to use NCBI RefSeq https://www.ncbi.nlm.nih.gov/refseq/ or LRG http://www.lrg-sequence.org/ databases for publishing mutations found.

As you want to analyze SNPs from RNA-seq data I believe you have a lot of SNPs. The best way to assess these SNPs according to ACMG guidelines (so doctors and researchers at hospitals can use your results more easily) is to use public software like VEP or SNPEff, or commercial software for variant annotation.

Then you can find mutations of highest interest for your publication and use 3D homology modeling with PDB templates to assess each variant's impact on function. There is free and commercial software for this step as well. Unfortunately, I am only familiar with commercial software for 3D structure modeling.

Dear BioStars moderators, can I name commercial software that can help users? Can I name it if I consult for companies that develop it?

ADD REPLY • link 7.8 years ago by Petr Ponomarenko ★ 2.8k

score 2 · Answer 1 · 2017-02-13

There is no single-best-tool out there that would help you to do such analyses. You need to combine various tools and methods to infer the impact of a mutation or a variant of the protein. While variant annotation and impact assessment tools provide a quantitative/qualitative impact of the modification, remember all tools (SNPEff, VEP, Phylo-P, etc.) assess the impact of position-centric impact, not the protein-centric impact. We discussed various strategies for coding variant analytics and related software for variant interpretation in this article published in Briefings In Bioinformatics. However, I must add that the landscape of tools are rapidly changing, but rather about performance than novel concepts and algorithms. These are some of the recent sequence-structure-function inference paper that employed some of the approaches we outlined in our article.

To clarify, you can do the following analytics

Identify the mutation/variant -> Map to protein -> Assess sequence impact due to the variant (See the example of SH2B3 here)
Map the mutation to a structure and evaluate structural impact using 3D modeling and molecular dynamics (See the example of a PDZ domain on SH2B3 here and a kinase domain on TGFBR2 here)
Map the mutation to a protein; generate canonical networks; perturb the network and assess various system properties and compare the impact (See the example of INO80D and this example of quantitative analyses of networks)

I hope, collectively the manuscripts mentioned above will be a good starting point for a multitude of analytics you could do and potentially inspire you to develop novel tools.

score 0 · Answer 2 · 2017-02-13

I am not sure if this is what you are looking for, but maybe you can try QUILTS (uses protein coding variant calls to build a database containing peptide sequences that contain single nucleotide variants): http://openslice.fenyolab.org/cgi-bin/quilts_cgi_v2.0.pl

Publication (since the site is not very informative): https://www.ncbi.nlm.nih.gov/pubmed/26631509