Hello list, I have very large xml blast outputs (in order to use with blast2go) and I need to reduce these consequently according to a selected criterium (for instance the E-value). Did anybody write a script for it ? If not, I would share mine when it will be done !!
Thanks,
Emmanuel
EDIT : Thank you for the answers. I'm not used in xml file handling, and I accept your proposal Egon ! So here is a subset of my blast xml output: EDIT 2: I removed the example: too big, and combined with the new "markup formatting" it makes the reading of the question very annoying.
It's big but I had to show you several situations (a multi-hsp hit for example). So I want to be able to filter this file using the hsp statistics.
Here is a detailed example of the output shrinkage I would like to conduct:
How would you do if you wanted to delete from the xml the path (including the markups) within
- <Iteration_stat>
and </Iteration_stat>
- <Hsp>
and </Hsp>
if the HSP evalue (<Hsp_evalue>here</Hsp>
) is > 1e-20
- <Hit>
and </Hit>
if all the HSP evalue is > 1e-20
- <Iteration>
and </Iteration>
if
-- all the HSP evalue is > 1e-20
-- OR there is the <Iteration_message>No hits found</Iteration_message>
message
?
Thanks again for help and advice.
Emmanuel
Please add a full snippet of the XML, including the root element and an element you like to match. That way, people can suggest the proper XPath query.
When you say filter, do you mean make a smaller XML file, or just extract the key data?
I mean a smaller XML file (the goal is to use blast2go).