What are fastest possible ways to sort the large bam files (greater then 100GB) (and With new version of highseq's; data will further increase in size...)
samtools sort -m 6000000000
Picard's SortSam
Any others tools or multicore/parallel versions? and best possible options?
How are large sequencing centers dealing with this problem
I may be wrong: novosort seems to be binary free...
I actually I was correct on the novosort licensing. When the license expires, mutli-threading will be switched off, though its single-threaded version is faster than samtools' single-threaded sort.
I believe that nowadays the "mt" branch of samtools is now merged into the trunk at github. I post this comment to warn other readers because I spend quite some time unsuccesfully trying to find the "mt" branch.
I mean the free license expires in 15 days, if I am correct.
thanks @lh3: I need to test novosoft on multiple threads and also need to see if sorted bam is compatible with GATK and other VC tools
I actually I was correct on the novosort licensing. When the license expires, you will be left with a single-threaded version. I do not know how much novosort costs, but it is really great.