I'm working on a project of data mining which which use pubmed articles xml files to read in python and parse its data to database, But problem is that some inline tags like <sup></sup>
don't read complete text, for example the abstract of a paper. The code to read xml file and the xml file which i'm trying to read is posted here.
Seems like your forgot some input informations :
Please provide a reproducible example of "xml files" and your code.
The tag intrupts text reading in python like it stops "Categorical variables were compared using the χ2test." at X and don't print further text
Please provide a reproducible example of "xml files" and your code. Please edit your original post to make it coherent.
Sometimes pubmed abstracts may contain rudimentary html code, those shouldn't cause problems, but without the code you are using it is hard to say. I have edited the title to better reflect what this is about.
Looks like python struggle with the "χ" of your χ2test (chi 2 test). This is a special character, you need to take it into account. Would you please, share a link to your xml file and your python code.
Does python have problems with unicode?