Hello,
I am currently looking to put together a data set of post translation sites from the UniProt. I am looking to download this set of data from the website and store it in a text or csv, the information in question is in the image below encased in a red box.
I am currently working in R with the package UniProt.ws and am having a hard time pinning down where in the package something along these lines can be done. Maybe there is another package or language out there that is better suited for this job, not sure.
What would be the best option here? Is it possible to pull this data down with an R script as I do not want to copy and paste all of these sites for each protein in question. I basically only want the information on PTM / Processing from UniProt.
Any help would be great.
Edit---------------------------------- for user me or those interested ------------------------------------------------------------------------
Thank you user me, this is what I was looking for, just need some help with which data is pulled and how it is displayed. I have never used this software before so it is new to me, do you know of any tutorials that are directly related to using SPARQL with UniProt? It looks like this is quite the useful bit of language.
So this looks good but I am missing some information, mainly that of glycosylation sites. I would like to pull the following information in the image below. So all the PTM that were pulled plus the glyco sites, not sure why they did not get pulled with this script. Example, N-Linked (........) -- I believe this would fall into the "text" column
What was provided by the script you typed is what I need but I need a bit more. This next image is what I am hoping for in the end data set. I would also like the protein entry and name as well if possible. I tried playing with the code but was unable to see how that all works out.
Again, thank you so much for the help! Your write up has been great and any resources you can point me in the direction of would be great, this tool is amazing! :)
Further details will need to go into a different answer as I am at max answer length :(
All of this has been a great deal of help! I am very close to what I am looking for. Your videos are great and thank you for the resources for tackling this project. I am still very new to this query language and its getting better each day. If it works for you I have one tweak to add to this script but am unsure how to list them as I need.
It involves the evidence section, the SPARQL url is great but is it possible to also display it by the type of evidence that it is such as: Publication or By similarity or UniRule annotation or Imported? I just would like to see it displayed as the type as text. If this can be done in addition to what I have, pasted below, that would be great. I believe that is one of the last pieces of the puzzle on this front.
This should be in two OPTIONAL blocks
And this one
If it is in one block then both things must be present but they are individually present or not.
I edited my question in response to your answer, thank you for the help, I have a couple more in the edit.