I have several thousands of xml files from https://www.predictprotein.org/ calculations for different proteins. I was wondering if anyone knows a package to parse that information in python or R in order to be able to perform some calculations easily.
please provide a sample of XML. What kind of information do you want to retrieve ? Most a of the time, a simple XSL stylesheet to the job if it is a simple query .
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Wow, definitely is worth it to learn how to use XSL!! Just one question about the code, in which part you select the info for the first column (ADRB2_HUMAN)? I don't understand this language yet.
please provide a sample of XML. What kind of information do you want to retrieve ? Most a of the time, a simple XSL stylesheet to the job if it is a simple query .
Hi, I have updated the question with the relevant information, including an example file. I am not familiar with XSL stylesheets but I'll dig into it.
That's not clear to me. give me a few lines for an example please.
I'm interested in the info inside this feature: <featuretypegroup type="secondary structures">...</featuretypegroup>
Some example lines inside those tags:
The idea would be to get:
I cannot find ENSP0001 in your example.
Is not there, is a made up protein ID. The protein ID is usually in the file name.