I process my blast (BLASTN 2.2.26+) searches through a script that divides the fasta inputs into N pieces, distributes one blast instance for each piece (in N processes), and, once over, concatenates the outputs into one file. It works pretty well with the table output format, but what about the xml? Indeed, I want to test a software that takes as input only xml format (a dummy cat does not work), and I'm struggling with that case.
Have you an idea on how to make one consistant xml output from several (it can be hundreds) "sub-"outputs? (I need a biopython NCBIXML parsability) ;-) ).
That worked perfectly. Thanks!
Hi Peter, it appears as though the merge method would work well for my situation, but I am unsure how to 'call' the merge function. I have copied the def merge: to a new python file -- can I simply paste this definition before the start of my actual script? And do I call the merge function by writing merge(directorycontainingxmlfilestobemerged, output_filename) ?
Hey, it's good to know I'm not the only one interested as I had no vote for the question ;-) You can find the modified code to make it working as is here: http://code.google.com/p/bioman/source/browse/BlastXMLmerge.py Enjoy! (And thanks again to Galaxy folks)
I guess in some sense I'm one of the Galaxy folk, although not one of their core developers.
Dear @Manu Prestat,
Thanks for the python script.
If I'm not mistaken in order to execute the script, I should do
python BlastXMLmerge.py merged.xml output1.xml output2.xml output3.xml
Please correct me if I was wrong. Thanks.
You're right KJ ;-)
Dear @Manu Prestat,
I tried to run the script with my xml files, but python raised this error:
-bash: /usr/local/bin/python: Argument list too long
I trying to run this script with 7,749 items.
I tried in my mac (OSX 10.8.4) and in redhat linux, both failed.
Any hints?
Thank you very much
This is a comman line length problem. First of all run this in the same directory as the input XML files to you can use local paths (avoid directory names naming the command longer).
If that is not enough, either do it in batches, or modify the script to accept a folder name as input (although then you will have less control over the file order...).