I have only recently started to experiment with HMMER (3.0, http://hmmer.org/) and I find it to be somewhat slow, but this is probably because I'm doing everything very naively. I've identified a number of potential issues, hoping to get some feedback on which avenues are worth pursuing.
Most importantly, I've had issues compiling the optimized code. I'm building it on a PPC cluster that, if I'm not mistaken, has special instructions which HMMER should be able to take advantage of. I tried to pass --enable-altivec and --enable-vmx to the configure script, but it keeps telling me it will go with "dummy" (unoptimized) code anyway. If I don't pass anything and I use gcc as my compiler, it pretends to pick everything up correctly, but then the "make check" target gives many errors. Specifically setting --enable-dummy yields a working executable, but with very low performance. If anyone has experience building HMMER, I would love to pick your brains to get this to work correctly.
I'm trying to run jackhmmer against a database of nine mammalian genomes. In dummy mode, I got it to return reasonable results (but slowly). However, my database is simply a massive FASTA file, and I find this surprising. Is there some sort of equivalent to "formatdb" for HMMER that I'm not aware of?
Somewhat unrelated, but the default E value for jackhmmer (10) seems incredibly "lenient". Shouldn't this be a number that's many orders of magnitude smaller?
Thanks!
P.S. My first question. Hope I phrased it clearly and with the correct tagging and formatting.
What is the exact model of PPC? AltiVec/VMX are not available to all models. On linux, you may check the availability from /proc/cpuinfo. On SSE3-supported CPUs, enabling SIMD makes Smith-Waterman tens of times faster.
Good point. Turns out we have Power5 chips, which don't have AltiVec, unfortunately.