Hi,
Has anyone who has used the GATK pipeline and/or sequence alignment tools such as BWA and stampy have a feel for what the worst bottlenecks are in performance? As I only have access to the one system that I am running it on (a SGI ICE cluster with network attached storage) it is hard to get a feel for what might improve it.
I/O seems to be a big issue in most of the processes I run and I wonder whether running on a machine with less cores but direct attached fast (raided/striped) storage would be faster.
Anyway, it's a bit of an unfocussed question, I just wonder if anyone had any real world experience of trying to run these processes in different scenarios and what they have found works best.
Thanks a lot
Dan
Is the IO sequential or random? Also what kind of interconnect do you have to the NAS? Presumably it's using NFS or something similar?