Some tools (bowtie2, usearch, etc.) do not accept input data or reference databases above certain size. What is an efficient strategy for dealing with such cases other than manual splitting (and later manual assembly of result files)? The more automated and reproducible the better.
EDIT: More specifically, I'm looking for comments on:
- random splitting (given that some tools use heuristics, I'm not that sure I'll get the same results each time if I cut the data into halves in a different way
- existing tools that are capable of split the data into smaller chunks (the most obvious example is formatdb or makeblastdb from NCBI, which produces formatted db in chunks of about 1GB)
- merging results from different chunks (input or reference)
This is probably not helpful, but I generally try to find better software. Splitting up may have made sense when BLAST was written in the previous century, it doesn't make much sense now.