We're planning to build a very powerful computing machine to serve bioinformatics application here at HCFMUSP (check my profile). I know that the common choice is to build a cluster or go cloud. But our adventurous spirit urges for some experimentation. We are somewhat envious of proprietary solutions using FPGA cards like these ones:
Hi Jarretinha... It would be good if you could add links. For instance I assume HCFMUSP is in Brazil, but honestly, the abreviation doesn't mean anything to me. Also, lots of people won't know what an FPGA is: you could make that a link to Wikipedia. Same with the different commercial solutions you're suggesting etc.
From what I've read, FPGAs are fine, but expensive. Have you looked into other, more readily available (and easier to program, mostly) architectures? I'm thinking of GPUs (CUDA, OpenCL), and Cell B.E. chips (in Playstation 3, programmable via C/C++ and also via OpenCL)?
CUDA are on my whishlist. I've just acquired a SuperServer 6016GT-TF and totally intend to fill it up with NVIDIA Tesla/Fermi . CUDA for Bioinformatics really works. And it is quite affordable.
Cell are nice but hard to program/port. Compilers/libs for this type of architecture aren't in good shape right now. Anyway, the speedup is comparable to that in CUDA. OpenCL too isn't mature enough.
By the way, Xlinx FPGAs possess PowerPC cores which makes them some sort of Cell when properly assembled.
A friend of mine (who deploys clusters) said the same thing. But I saw CLCbio cube in action in a large dataset and got very impressed. And we have some issues about energy consumption (watt per flop). That's why I'm looking for people with some experience with these things. By the way, the cloud isn't mature enough to deliver the same performance as in-house cluster. Check out this example - http://www.genomeweb.com/sites/default/files/walker.pdf
Bioinformatics don't stop at genome assembly or microarray analysis. Most of what I do depends on sharing a lot of data. Certain types of aligment, too. I know that CFD is way too different, but many phylogenetics tasks lie in EP category. Most population genetics tasks lie in the EP/FT category. And many systems biology lie in CG. Your example is just the entry point of the bioinformatics pipeline. Just try to compute the unrooted phylogeny of all Archaea using complete genomes and you'll see the problem.
I would go as far as to say that outside of assembly, very few bioinformatics processes require the kind of tight coupling that can't be addressed by smart distributed computing. FPGAs in particular make the work/$ equation relatively unattractive (both from cost of hardware and cost of development). GPUs on the other hand, while not suitable for all problems, especially many bioinformatics problems, do change the economics a bit.
All depends how diverse will be the applications running on this beast. If the end users are from DNA sequencing, NMR, mass spec to crystallography and the total number o applications is say 50+ it is unlikely you will be able to support it not even on FPGAs but even with CUDAs. Either something installs / compiles (almost) out of the box or you may have to drop it. Software authors will be of no help when it comes to porting it (and possibly a bunch of libs they depend on) to a new platform they do not even have in house.
On the other hand whenever problem is restricted to one domain, FPGAs are great. I used SORCERER for protein mass spec and DeCypher for blast searches.
Anyway, have fun with new servers, whatever they will be :-)
Most people here deal with sequence data and microarrays. So, the basic idea is to use FPGAs to sequence data (higher demand) and CUDAs to microarrays and related. Molecular dynamics and related stuff rely on another cluster.
[?]
Here, we will develop solutions on FPGAs (the utmost dream is a FPGA card able to perform Burrows-Wheeler transform based alignments). WE are the tinkerers . . .
[?]
Anyway, some people might want a proprietary solution. It's good to know they are worth the trouble, though.
Hi Jarretinha... It would be good if you could add links. For instance I assume HCFMUSP is in Brazil, but honestly, the abreviation doesn't mean anything to me. Also, lots of people won't know what an FPGA is: you could make that a link to Wikipedia. Same with the different commercial solutions you're suggesting etc.
How it sounds now? I can put more references.
Thanks for the links, they are most helpful and also a very interesting question indeed.
Excellent edit! Thanks ;)