I have several thousands of xml files from https://www.predictprotein.org/ calculations for different proteins. I was wondering if anyone knows a package to parse that information in python or R in order to be able to perform some calculations easily.
please provide a sample of XML. What kind of information do you want to retrieve ? Most a of the time, a simple XSL stylesheet to the job if it is a simple query .
Wow, definitely is worth it to learn how to use XSL!! Just one question about the code, in which part you select the info for the first column (ADRB2_HUMAN)? I don't understand this language yet.
please provide a sample of XML. What kind of information do you want to retrieve ? Most a of the time, a simple XSL stylesheet to the job if it is a simple query .
Hi, I have updated the question with the relevant information, including an example file. I am not familiar with XSL stylesheets but I'll dig into it.
That's not clear to me. give me a few lines for an example please.
I'm interested in the info inside this feature: <featuretypegroup type="secondary structures">...</featuretypegroup>
Some example lines inside those tags:
The idea would be to get:
I cannot find ENSP0001 in your example.
Is not there, is a made up protein ID. The protein ID is usually in the file name.