Is Smith Waterman ever used?
5
5
Entering edit mode
9.6 years ago
mikko732 ▴ 60

For fun I'm working on a Smith-Waterman implementation to a hardware accelerator (FPGA based) which I'm also designing. So far the results looks better then expected so I thought I talk to some bioInformatic people about it but so far the response been very negative, for no one seems to run Smith-Waterman they all use BLAST. I can understand if the price of the accelerator put them off but even without talking about price they simple aren't interested. So my question to this community: is Smith-Waterman only used in University classes and not in real life or are there actually anyone out there who use it to scan databases?.

For anyone who is interested my accelerator is the size of small portable HDD, hooked up with Thunderbolt and can make protein alignment with Smith-Waterman using Affine gap penalty at 2500 Gcups

alignment sequence • 6.4k views
ADD COMMENT
0
Entering edit mode

Good tool is always useful (do not be fulled by "semi" or "not" interested community) . However, the fact is if you want to use it in the real world you need to be competitive, otherwise you are not ready for an outside world and thus limited to "university classes". Personally, I am working in gray area of sequence similarity (just along the border) and I am frequently dealing strategies to get SW accuracy with "super"-BLAST speed. So a lot of hybrid methods and heuristic strategies to get the best as possible in terms of accuracy in that area.

Also, note that utility of such tool (if truly competitive) extends even further, grey area of string similarity is not only interesting to academia folks :)

ADD REPLY
1
Entering edit mode

Point taken, the problem is knowing what to measure to see that you are competitive. For example so far I measured my implementation against other FPGA accelerator companies and then I feel very competitive as I get far more bang for the money than they do. But the problem is these numbers seems to be more for mutual admiration and not tuned to what the customers are looking for.

ADD REPLY
0
Entering edit mode

Can I use "Affine gap alignment" with BLAST? or I have to program the idea of "Affine gap alignment"?

thanks

ADD REPLY
4
Entering edit mode
9.6 years ago
Michele Busby ★ 2.2k

Did you see Mengyao Zhou's paper on the banded Smith Waterman? It's used in the Mosaik Aligner.

ADD COMMENT
0
Entering edit mode

Yes I seen it but as I first aimed at a pure smith waterman implementation I did not read it that carefully (so many articles and so little time ;-) ) but maybe good idea to go back to it if I want to combine my smith waterman with some heuristic, Thanks for the tip.

ADD REPLY
3
Entering edit mode
9.6 years ago
Dan D 7.4k

bwa uses the smith-waterman algorithm, though not exclusively. Most modern aligners have improved on the base Smith-Waterman method, so in that context it's obsolete.

ADD COMMENT
0
Entering edit mode

I suspected that and it probably means that the requirements on the FPGA implementation are very different from the ones I put up.

ADD REPLY
3
Entering edit mode
9.6 years ago
Asaf 10k

BLAST uses Smith-Waterman, so everyone uses Smith-Waterman eventually. I wonder if your implementation can accelerate BLAST searches.

ADD COMMENT
0
Entering edit mode

Haven't read much about BLAST but I suspect the systolic array used in FPGA's aren't good fit for BLAST. I seen some BLAST implementation on FPGA and those are mostly focused on improving other things in BLAST then the Smith-Waterman part for good reasons I guess

ADD REPLY
0
Entering edit mode

AFAIK BLAST doesn't use Smith-Waterman by default but X-dropoff gapped alignment algorithm. To compute locally optimal alignments you need to apply the -use_sw_tback flag (I always do).

ADD REPLY
2
Entering edit mode
9.6 years ago

Over the last 10 years or so, I've heard several times about hardware acceleration for sequence alignment algorithms, e.g. http://www.ncbi.nlm.nih.gov/pubmed/17946720. I don't know what happened to them but I don't think they're widely used. You may want to find out to get an idea of what community uptake of these is/was.

ADD COMMENT
0
Entering edit mode

Erm Convey? https://www.genomeweb.com/informatics/convey-claims-15-fold-speed-bwa-algorithm

By no means the only FPGA accelerated platform on the market, there's several.

ADD REPLY
1
Entering edit mode
9.6 years ago
amirmhzadeh ▴ 90

It is not super practical yet but there are some GPU-based implementations of Smith-Waterman algorithm out there that take advantage of massively parallel GPUs' architecture to perform sequence searches 10x-50x faster than NCBI BLAST. With much better sensitivity (i.e. finding THE optimal solution). MR-CUDASW, CUDASW++, G-PAS, and SW-CUDA are few examples of such programs.

ADD COMMENT
0
Entering edit mode

Where does this 10x-50x come from? The CUDASW++ 3.0 paper (2013) only shows a difference of 170GCUPS/20=8.5x for BL50. For BL62, the gap is smaller. I am not sure if the authors are running BLAST on a single thread or on the maximum 4 threads their machine has. SW-CUDA gives a GCUPS ~66. MR-CUDASW seems similar to CUDASW++. The highest GCUPS I have seen in these papers is 170. If OP could achieve 2500 GCUPS, that would be quite significant.

ADD REPLY
0
Entering edit mode

First I must make clear before it happens that I'm not interested in getting a discussion war on what technology is better to use GP/GPU/FPGA... they all have their pros and cons and for FPGA's the big draw backs are they are complex to program, hard to scale and super expensive (at least for the big interesting chips).

Now to the real comment, it's really hard to compare solutions even if only doing it in the FPGA field for everyone are using different chips and has different limitations in their implementations. Some of my improvements of course comes from the new chips I use but a big part come from a new Smith Waterman PE I'm currently working on. I'm not yet 100% sure this new PE approach works but I feel that if no one cares about the Smith Waterman it's stupid to spend more time on it.

There are some TCUP solution for Smith Waterman out there (Convey, SciEngines etc) but they all seem to start of from being at least desktop size and minimum $50,000. I'm trying to find out if there are a interest for pocket size solution starting of at $10,000.

For the interested in FPGA's and Smith Waterman I can tell you I'm working with the largest Arria 10 and the solution can be run time configured with gap costs and matrices to at least support PAM 10-500 and BLOSUM 30-100.

ADD REPLY
0
Entering edit mode

Hey,

Did you manage to answer your Q? I am currently working on accelerating SW on FPGA, I am at a fix. BLAST has O(mn) complexity while the accelerated SW has O(m+n-1) complexity. Now what is preventing biologists from using SW? What is the issue that needs to be addressed? Do they even care about the accuracy it offers or are the biologists better off with the heuristics?

Complex to program yes they are but we could be using HLS tool to simplify. Is scaling and expense the main concern? expense not a good trade off for the accuracy and speed it offers?

Thanks in advance

ADD REPLY

Login before adding your answer.

Traffic: 1584 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6