[Edit : add clearer explanation] Hello,
So, I have cancer-normal RNA-seq and exome-seq data I downloaded from NCBI GEO (human). I have processed the data for getting gene expression level from the RNA-seq data set. Now, I want to do something else. What I want to do is so variant calling, either SNP or indel from the data set. I ahve done this before but now, I want to cross checking the mutation that occur in the genome to the protein sequence. I want to see if there are any changes in amino acid sequences derived from the gene that has mutation.
I want to ask for a tool that can help me to detect mutation (SNP, deletion, and/or insertion) and "translate" the aligned DNA sequence to amino acid sequence. From there, I want to check whether any change occur in the amino acid sequence by comparing to the protein sequence reference I get from PDB or Uniprot. I remember there is a blast tool that can compare nucleotide sequence to protein sequence but I want to include some mutation data. Thank you for your suggestion.
Could you please describe what is the data you want to analyze, what are the species you study, what do you want to get as the result of the analysis and whether you are interested in homologous sequences in PDB and UniProt.
Hello, I have edited my question. As for homologous sequences, I don't understand what you mean. Can you explain it further?
When you compare SNPs to the reference you get from PDB or Uniprot there are at least two options: 1) collect data for the species you study (namely human in your case), or 2) find all homologous sequences that are good for analysis. The way you analyze mutations and tools needed are different for these two approaches. Also, tools are different for analyzing just a few mutations thoroughly versus many mutations in brief. It looks like you are studying human data and have a lot of SNPs.
For human clinical SNP analysis, some approaches are more common and more broadly accepted. For example, ACMG guidlines https://www.acmg.net/docs/Standards_Guidelines_for_the_Interpretation_of_Sequence_Variants.pdf are de facto standard in the US. One of the very important parts of ACMG guidelines is to use NCBI RefSeq https://www.ncbi.nlm.nih.gov/refseq/ or LRG http://www.lrg-sequence.org/ databases for publishing mutations found.
As you want to analyze SNPs from RNA-seq data I believe you have a lot of SNPs. The best way to assess these SNPs according to ACMG guidelines (so doctors and researchers at hospitals can use your results more easily) is to use public software like VEP or SNPEff, or commercial software for variant annotation.
Then you can find mutations of highest interest for your publication and use 3D homology modeling with PDB templates to assess each variant's impact on function. There is free and commercial software for this step as well. Unfortunately, I am only familiar with commercial software for 3D structure modeling.
Dear BioStars moderators, can I name commercial software that can help users? Can I name it if I consult for companies that develop it?