L.S.,
I have a list of proteins from either the UniProtKB or PlasmoDB databases that have a SignalP annotation. These proteins are thus predicted to have a signal peptide, of varying length, for secretion. I can manually remove the sequence corresponding to the predicted signal peptide, but takes a lot of time :(
I was wondering if it's possible to these kinds of operations automatically, perhaps using some kind of online tool. Or do I need to program a script of some sort to perform the operation?
Kind regards, Arman
If you can get the ranges for each protein (without the signal peptide) in the form of a BED file then you can use
bedtools getfasta
(https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html) to do this.For the initial table of ranges, you can download the UniProt data in GFF format and parse that table. Can you provide some examples?
if you're able to put together a script that will be most convenient I assume.
Here for example: https://www.uniprot.org/uniprot/?query=organism%3A%22Plasmodium%20falciparum%20(isolate%203D7)%20%5B36329%5D%22%20annotation%3A(type%3Asignal)&columns=id%2Centry%20name%2Creviewed%2Cprotein%20names%2Cgenes%2Corganism%2Clength%2Cfeature(SIGNAL)%2Cdatabase(EnsemblProtists)%2Cdatabase(EuPathDB)&sort=score
Alright! I have to process this information in order for me to fully understand what you've done ;) Can you send me the file with the mature protein sequences?
Thank you for your time!