Hi! I wonder if someone can help me please
I have a list of ~800 peptide sequences, for example:
ALHYIHDGIGAMVRKVLELTGK
This is a cleaved peptide from within GATD3B, whose full length sequence is:
MAAVRALVASRLAAASAFTSLSPGGRTPSQRAALHLSVPRPAARVALVLSGCGVYDGTEIHEASAILVH
LSRGGAEVQIFAPDVPQMHVIDHTKGQPSEGESRNVLTESARIARGKITDLANLSAANHDAAIFPGGFG
AAKNLSTFAVDGKDCKVNKEVERVLKEFHQAGKPIGLCCIAPVLAAKVLRGVEVTVGHEQEEGGKWPYA
GTAEAIKALGAKHCVKEVVEAHVDQKNKVVTTPAFMCETALHYIHDGIGAMVRKVLELTGK
# ^--------------------^
or
APPEPVPPPRAAPAPTHV
Which is a cleaved peptide from within VGF, whose full length sequence is:
MKALRLSASALFCLLLINGLGAAPPGRPEAQPPPLSSEHKEPVAGDAVPGPKDGSAPEVRGARNSEPQD
EGELFQGVDPRALAAVLLQALDRPASPPAPSGSQQGPEEEAAEALLTETVRSQTHSLPAPESPEPAAPP
RPQTPENGPEASDPSEELEALASLLQELRDFSPSSAKRQQETAAAETETRTHTLTRVNLESPGPERVWR
ASWGEFQARVPERAPLPPPAPSQFQARMPDSGPLPETHKFGEGVSSPKTHLGEALAPLSKAYQGVAAPF
PKARRPESALLGGSEAGERLLQQGLAQVEAGRRQAEATRQAAAQEERLADLASDLLLQYLLQGGARQRG
LGGRGLQEAAEERESAREEEEAEQERRGGEERVGEEDEEAAEAEAEAEEAERARQNALLFAEEEDGEAG
AEDKRSQEETPGHRRKEAEGTEEGGEEEDDEEMDPQTIDSLIELSTKLHLPADDVVSIIEEVEEKRKRK
KNAPPEPVPPPRAAPAPTHVRSPQPPPPAPAPARDELPDWNEVLPPWDREEDEVYPPGPYHPFPN
# ^----------------^
YIRPRTLQPPSALRRRHYHHALPPSRHYPGREAQARRAQEEAEAEERRLQEQEELENYIEHVLLRRP
(both taken from UniProt)
My goal is to find out which proteases were most likely to have cleaved the specific peptide from the parent protein. I think this is possible to do using an online database such as Proteasix.
My problem is that the Proteasix prediction tool requires some features for searching:
- The UniProt ID of the full length protein
- The start AA position of the cleaved peptide within the full protein (256 for GATD3B (A), 495 for VGF (A))
- The stop AA position of the cleaved peptide within the full protein (268 and GATD3B (K), 513 for VGF (V))
Whilst this information isn't difficult to gather for a handful of peptides, I need to do this for hundreds of peptides. I wondered if anyone knew how I could go about this? The only information I have in my original dataframe in R is the peptide sequence and the gene the peptide was derived from, so I will need to somehow 1- get the full-length protein sequence for each peptide and the corresponding UniProt ID, 2- check the full length sequence against the shorter peptide sequence and 3- pull out the position for the start and stop AAs for each peptide. Then I can use the Proteasix prediction tool to see which proteases were most likely to have cleaved the peptides from the longer sequence at the specific start and end AA sites.
I have tried using different prediction tools that don't require this information but they don't do exactly what I want.
I'm a student and very new to this so thank you for your help!!
Hi Ram, Did you find any method for that? I am also having the same request. I could not open the website you suggested: http://www.proteasix.org/ Is this website working?