What's the easiest way to split a BLAST+ query into pieces BLAST+ all of the chunks with blastn against NT, and merge them back together? I presume using query_loc is better than literally splitting the sequence file. Afterwards, should I just write a script to strip out the headers, parse all of the documents as XML files and export only what I need (probably needlessly memory intensive), or is there a tool to join BLAST+ results automatically or simply automate the entire process?
It just seems like this should exist given how much memory is required by BLAST+.
Have you verified that BLAST is indeed taking (excessively) more memory when processing a larger subject file? I thought that the queries were handled sequentially and thus there is just a little more overhead.
what do you mean by "joining the the chunks" ? do you want to merge the Hsps ?
Yes, I want to join the HSPs, basically anything resulting from the alignment. Essentially I believe I need to place anything between and including the [?] tags in the XML format, keeping the headers of one file to preserve the structure.
And I have verified that BLAST+ does indeed consume more swap as it runs longer and it runs fine on smaller, but still large, data sets. In case you're wondering, I am working from a precompiled version of the 64 Linux version of BLAST+. I am using soft filtering options though I have attempted to run it without them. And an e value of 0.0001.
BLAST+ doesn't require lots of memory: instead it simply requires as much RAM as the size of the database you are searching against. Splitting your input will not change this.