Hi,
I am doing a bioinformatics project. Now, I get the thousands of insertion deletion SNPs. All of the SNPs are in the coding region. I know the SNP position, gene name and gene location of SNPs. I want to get the percentage of protein length with and without the SNPs.
(For example, if the gene has the SNP, the length of protein is 100 amino acids. If the gene doesn't have the SNP, the length will be 1000 amino acids. So the percentage of length will be 100/1000= 10%.)
So how can I do that by R or is there any software can do it?
Ron
SNPs won't produce change in protein lengths unless they are nonsense or disrupt the stop-codon. In other words, majority of SNP changes will only produce synonymous/nonsynonymous changes that won't affect amino acid lengths. Indels (Insertions or deletions) can cause frameshift mutations that can increase or decrease the length of the protein. Please go through some paper that describe different variant types and then modify your question to make it more clear.