Hello, This is easy.
What do they mean with 'do not take the ends of the molecules into account'?
If I understand it right, the end of the molecule in this case would be the C Terminal group (-COOH) that exists at the end of the last amino acid in the C-Terminal.
What is the biological explanation and how do I implement this in phyton?
I don't know the biological explanation for why they want to ignore the end of this group. but you can implement it easily in BioPython.
Load the Fasta file in BioPython. something like
from Bio import SeqIO
for record in SeqIO.parse("example.fasta", "fasta"):
printrecord.id)
That line of code will print the identifier of the sequence.
To calculate the molecular weight of this polymer, you have two options. The first option would be to write your own logic to do that and the second option would be to use one of the modules given by BioPython to directly calculate this. I prefer if you can do both options and compare their results. just for you to learn. Ok. I will explain the manual part first and then i will show you a direct code in BioPython to do that directly.
Manual Molecular Weight Calculation
Manual Calculation of Molecular Weight. Now a protein is a polymer of 21 Amino Acids (Eukaryotes) and 22 in some prokaryotes.
You can create a python dictionary with Key represents the amino acids single letter and the value would be the amino acid molecular weight, you can grab these values online.
Create two variables, first variable is the molecular weight of Water (H2O) and the other is the molecular weight of (-COOH).
Now iterate over the sequence one letter at a time and sum the total molecular weight in a variable called "Total" for instance.
for AA in myseq.Seq:
Total += MWs[AA]
Now the tricky part, when two amino acids form a covalent bond, there is one molecule of water removed for the peptide bond to form, ok , so basically, the number of condensation reactions or the number of water molecules produced would equal to (N-1) where N is the number of amino acids in the given protein sequence. OK!!! make sense right.
Now the total molecular weight would be Total = Total - ((N-1) * MW_WATER).
If the end of molecule is right, you would subtract as well the molecular weight of COOH from this total.
Total = Total - ((N-1) * MW_WATER) - MW_COOH
BioPython way of calculating Molecular Weight
Please check the first example on this page,
https://biopython.org/docs/1.75/api/Bio.SeqUtils.ProtParam.html
Now compare both Molecular Weights, they should be equal or very close together.
Technically, We should take into account the protonation states of acidic and basic amino acids into account, since, these AAs do exist in physiological PH or near Physiological PH intracellularly. but it is ok. the Molecular Weight produced would be somewhat very close to the actual weight.
I hope this helps.
Are the sequences mRNAs (to be translated into proteins) or actual protein sequences which will still undergo post translational modifications like e.g. the possible removal of the initiator methionine?
This is the protein sequence we are working with. It starts with a K which stands for Lysine
Yes thank you!! I managed to calculate the molecular weight but how do I subtract these molecular ends as stated in the assignment? Thank you in advance!