Help finding proteases that most likely cleaved a peptide from a full-length protein
0
0
Entering edit mode
18 months ago

Hi! I wonder if someone can help me please

I have a list of ~800 peptide sequences, for example:

ALHYIHDGIGAMVRKVLELTGK

This is a cleaved peptide from within GATD3B, whose full length sequence is:

MAAVRALVASRLAAASAFTSLSPGGRTPSQRAALHLSVPRPAARVALVLSGCGVYDGTEIHEASAILVH
LSRGGAEVQIFAPDVPQMHVIDHTKGQPSEGESRNVLTESARIARGKITDLANLSAANHDAAIFPGGFG
AAKNLSTFAVDGKDCKVNKEVERVLKEFHQAGKPIGLCCIAPVLAAKVLRGVEVTVGHEQEEGGKWPYA
GTAEAIKALGAKHCVKEVVEAHVDQKNKVVTTPAFMCETALHYIHDGIGAMVRKVLELTGK
#                                      ^--------------------^

or

APPEPVPPPRAAPAPTHV

Which is a cleaved peptide from within VGF, whose full length sequence is:

MKALRLSASALFCLLLINGLGAAPPGRPEAQPPPLSSEHKEPVAGDAVPGPKDGSAPEVRGARNSEPQD
EGELFQGVDPRALAAVLLQALDRPASPPAPSGSQQGPEEEAAEALLTETVRSQTHSLPAPESPEPAAPP
RPQTPENGPEASDPSEELEALASLLQELRDFSPSSAKRQQETAAAETETRTHTLTRVNLESPGPERVWR
ASWGEFQARVPERAPLPPPAPSQFQARMPDSGPLPETHKFGEGVSSPKTHLGEALAPLSKAYQGVAAPF
PKARRPESALLGGSEAGERLLQQGLAQVEAGRRQAEATRQAAAQEERLADLASDLLLQYLLQGGARQRG
LGGRGLQEAAEERESAREEEEAEQERRGGEERVGEEDEEAAEAEAEAEEAERARQNALLFAEEEDGEAG
AEDKRSQEETPGHRRKEAEGTEEGGEEEDDEEMDPQTIDSLIELSTKLHLPADDVVSIIEEVEEKRKRK
KNAPPEPVPPPRAAPAPTHVRSPQPPPPAPAPARDELPDWNEVLPPWDREEDEVYPPGPYHPFPN
# ^----------------^
YIRPRTLQPPSALRRRHYHHALPPSRHYPGREAQARRAQEEAEAEERRLQEQEELENYIEHVLLRRP

(both taken from UniProt)

My goal is to find out which proteases were most likely to have cleaved the specific peptide from the parent protein. I think this is possible to do using an online database such as Proteasix.

My problem is that the Proteasix prediction tool requires some features for searching:

  1. The UniProt ID of the full length protein
  2. The start AA position of the cleaved peptide within the full protein (256 for GATD3B (A), 495 for VGF (A))
  3. The stop AA position of the cleaved peptide within the full protein (268 and GATD3B (K), 513 for VGF (V))

Whilst this information isn't difficult to gather for a handful of peptides, I need to do this for hundreds of peptides. I wondered if anyone knew how I could go about this? The only information I have in my original dataframe in R is the peptide sequence and the gene the peptide was derived from, so I will need to somehow 1- get the full-length protein sequence for each peptide and the corresponding UniProt ID, 2- check the full length sequence against the shorter peptide sequence and 3- pull out the position for the start and stop AAs for each peptide. Then I can use the Proteasix prediction tool to see which proteases were most likely to have cleaved the peptides from the longer sequence at the specific start and end AA sites.

I have tried using different prediction tools that don't require this information but they don't do exactly what I want.

I'm a student and very new to this so thank you for your help!!

R peptides proteasix • 570 views
ADD COMMENT
0
Entering edit mode

Hi Ram, Did you find any method for that? I am also having the same request. I could not open the website you suggested: http://www.proteasix.org/ Is this website working?

ADD REPLY

Login before adding your answer.

Traffic: 3420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6