Question

How to calculate the mass of whole protiens and sub regions?

0

Entering edit mode

8.4 years ago

hakimelakhrass ▴ 80

I am working on the automatic download of proteins to calculate their mass and the mass of the different subregions. I was wondering if there was a tool to help me with this or would I have to program it from scratch?

I can receive as an output a fasta file from NCBI or GenBank flat file (as well as other formats). The fasta contains no information about the regions. The relevant part of the genebank file looks like this:

**            ##Evidence-Data-END##
FEATURES             Location/Qualifiers
     source          1..230
                     /organism="Mus musculus"
                     /strain="NOD"
                     /db_xref="taxon:10090"
                     /chromosome="18"
                     /map="18"
     Protein         1..230
                     /product="endothelial cell-specific chemotaxis regulator"
                     /note="endothelial cell-specific molecule 2; apoptosis
                     regulator through modulating IAP expression"
                     /calculated_mol_wt=24341
     Region          134..228
                     /region_name="ECSCR"
                     /note="Endothelial cell-specific chemotaxis regulator;
                     pfam15820"
                     /db_xref="CDD:292448"
     CDS             1..230
                     /gene="Ecscr"
                     /gene_synonym="1110006O17Rik; ARIA"
                     /coded_by="NM_001033141.1:82..774"
                     /db_xref="CCDS:CCDS37763.1"
                     /db_xref="GeneID:68545"
                     /db_xref="MGI:MGI:1915795"
ORIGIN      
        1 mlrdisleah glgstltpll ahqlpqgrvr gyssqptttq tsqeilqkss qvslvsnqpv
       61 tprsstmdkq slslpdlmsf qpqkhtlgpg tgtperssss ssssssrrge asldatpspe
      121 ttslqtkkmt illtilptpt sesvltvaaf gvisfivilv vvviilvsvv slrfkcrknk
      181 esedpqkpgs sglsescsta ngekdsitli smrninvnns kgsmsaekil
//

**

So in theory I can extract the region from this file using some text mining and parse the fasta. Since that would take sometime I figured I would post and see if anyone had a better solution

R protiens Amino Acids ExPASY • 2.4k views

ADD COMMENT • link 8.4 years ago by hakimelakhrass ▴ 80

2

Entering edit mode

If you don't have to work with these files, you could use EnsEMBL's API to extract this kind of information from the database. I think the protein molecular weight is available. You can also compute the mass of any peptide as the sum of the masses of the amino-acid residues (plus water). There are also plenty of online tools for this.

ADD REPLY • link 8.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I do not have to work with these files, no. The key point is automation only. I just need to be able to feed a list of protein names and receives the MW of the whole protein and all its subregions. Doesn't matter what I use to achieve that, since it will just be used to compare the an MALDI output. I will take a look at that API, thanks! Also I indeed use an MW compute tool in R. The problem is getting the mass of the subregions!

ADD REPLY • link 8.4 years ago by hakimelakhrass ▴ 80

0

Entering edit mode

For MW calculations in automated setting I can recommend the EMBOSS suite of tools.

ADD REPLY • link 8.4 years ago by ALchEmiXt ★ 1.9k

0

Entering edit mode

You can extract the sequences of the regions and compute the masses yourself using a table of masses of amino-acid residues or using the mw() function of the R package Peptides.

ADD REPLY • link 8.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

That's what I have been doing for the whole protein sequence. Indeed it should not be too hard to extract the region based on the genebank file. I was just wondering if there was some automated way to extract or identify the regions of the protein. I guess I will do it myself. Thanks!

ADD REPLY • link 8.4 years ago by hakimelakhrass ▴ 80