Appears that there is a misunderstanding on my part, and this question is an attempt to sort out these conflicts.
Definitions:
- Protein change being a non-silent mutation where a single nucleotide substitution results in
- Codon position being its count as the nth protein within an exon.
The source of these conflicts started with a question regarding how to calculate the protein change and codon position within a protein sequence of a single nucleotide substitution; in this case, the reference nucleotide sequence being from Ensembl. This question was flagged as a duplicate, but referenced this question and answer. My reading of the answer, which must be wrong, is that to get the protein change, first get the exon starting location, and then walk the nucleotide sequence by 3 until the location of the mutation is contained within the nucleotide sequence being walked.
What am I missing to achieve my goal of calculating the protein change and codon position within a protein sequence of a single nucleotide substitution from scratch? While I would like to do this from scratch, if possible, I would like to be able to script the calculations; ideally in Perl.
Additionally, if it’s not clear am aware of how to three-nucleotide sequences to their amino acid, if I know which three-nucleotide sequences to process.
DISCLAIMER: If it’s not clear, I have zero background in bioinformatics or microbiology. If using non-plain English terms, where possible please attempt to pair terms with a plain English contextualized explanation. Many thanks in advance.
+1 @Pierre Lindenbaum: The list of required input and the pseudo-code is a huge help - big thank you. The zero-based, half-open base coordinate system is interesting too! Plan to get to this in the next few days. Cheers!