End of molecules
1
0
Entering edit mode
4.0 years ago
Pythonnoob • 0

Hello,

For a project we got this question: A FASTA file contains a number of protein sequences. Write a Python script that, given the name of the FASTA file, writes the sequence identifier and the molecular weight for each sequence in the file. Note: do not take the ends of the molecule into account.

What do they mean with 'do not take the ends of the molecules into account'?

What is the biological explanation and how do I implement this in phyton?

Thank you in advance!

sequence • 1.9k views
ADD COMMENT
0
Entering edit mode

Are the sequences mRNAs (to be translated into proteins) or actual protein sequences which will still undergo post translational modifications like e.g. the possible removal of the initiator methionine?

ADD REPLY
0
Entering edit mode

This is the protein sequence we are working with. It starts with a K which stands for Lysine

>seq_compl complete sequence
KKLYRMHRKDGEWGELKVQFFLKNGFAHCCTSLEIVWFTLGGMMSKQIHGVIAHAWLFKP
DCETLRFACDINQLAVKTSMLFEHVCHIRNEVKDDPFTWLDCPFYQNRDMSYYHFHAQHM
FYFLACCAPDPKNRPMFVYFMSNSHNWEYPVMFCMLYAYCVDENMLWCYRCRKKVSSAVI
NTIPMREMWVAQFVVDMKWFAMHCHVTTWCHGRYWQWYHPPQIMRCEWHFDRPDCWRTLD
NNHPKIRGSFHQFATHIRIEWNRGRMPDYTDMVWKRAYMIKRNHRNRMLKVVFCFLIEMG
RWIADLNRPVEKPNSRIIGCAYGFFEDNTLSCLKNMHTAVVKWHYEHRRIARVRSDTHHT
QQREWDSSQRCAHKQISIQDQHPVEHDHIQKAW
ADD REPLY
0
Entering edit mode

Yes thank you!! I managed to calculate the molecular weight but how do I subtract these molecular ends as stated in the assignment? Thank you in advance!

ADD REPLY
0
Entering edit mode
4.0 years ago
drsami ▴ 90

Hello, This is easy.

What do they mean with 'do not take the ends of the molecules into account'?

If I understand it right, the end of the molecule in this case would be the C Terminal group (-COOH) that exists at the end of the last amino acid in the C-Terminal.

What is the biological explanation and how do I implement this in phyton?

I don't know the biological explanation for why they want to ignore the end of this group. but you can implement it easily in BioPython.

  1. Load the Fasta file in BioPython. something like

    from Bio import SeqIO

    for record in SeqIO.parse("example.fasta", "fasta"): printrecord.id)

That line of code will print the identifier of the sequence.

To calculate the molecular weight of this polymer, you have two options. The first option would be to write your own logic to do that and the second option would be to use one of the modules given by BioPython to directly calculate this. I prefer if you can do both options and compare their results. just for you to learn. Ok. I will explain the manual part first and then i will show you a direct code in BioPython to do that directly.

Manual Molecular Weight Calculation

  1. Manual Calculation of Molecular Weight. Now a protein is a polymer of 21 Amino Acids (Eukaryotes) and 22 in some prokaryotes. You can create a python dictionary with Key represents the amino acids single letter and the value would be the amino acid molecular weight, you can grab these values online.

  2. Create two variables, first variable is the molecular weight of Water (H2O) and the other is the molecular weight of (-COOH).

  3. Now iterate over the sequence one letter at a time and sum the total molecular weight in a variable called "Total" for instance.

    for AA in myseq.Seq: Total += MWs[AA]

  4. Now the tricky part, when two amino acids form a covalent bond, there is one molecule of water removed for the peptide bond to form, ok , so basically, the number of condensation reactions or the number of water molecules produced would equal to (N-1) where N is the number of amino acids in the given protein sequence. OK!!! make sense right.

  5. Now the total molecular weight would be Total = Total - ((N-1) * MW_WATER).

  6. If the end of molecule is right, you would subtract as well the molecular weight of COOH from this total.

    Total = Total - ((N-1) * MW_WATER) - MW_COOH

BioPython way of calculating Molecular Weight

Please check the first example on this page,

https://biopython.org/docs/1.75/api/Bio.SeqUtils.ProtParam.html

Now compare both Molecular Weights, they should be equal or very close together.

Technically, We should take into account the protonation states of acidic and basic amino acids into account, since, these AAs do exist in physiological PH or near Physiological PH intracellularly. but it is ok. the Molecular Weight produced would be somewhat very close to the actual weight.

I hope this helps.

ADD COMMENT

Login before adding your answer.

Traffic: 1122 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6