Question

Is Matching Blastn Output A Requirement For Alternative Sequence Matching Implementations?

1

Entering edit mode

13.7 years ago

Jcd ▴ 30

I've read a paper demonstrating an implementation of sequence alignment and the authors indicate that in order to provide substantial speed gains they must compromise on query size and output fidelity, relative to BLASTn.

From S. Datta, P. Beeraka, and R. Sass, "RC-BLASTn: Implementation and Evaluation of the BLASTn Scan Function", in Proc. FCCM, 2009, pp.88-95.:

"While one could argue that BLAST is a heuristic and 100% compatibility with NCBI BLAST is unnecessary, it is difficult to convince biologists....Thus, in addition to just “better performance” there are still signficant challenges including complete compatibility with NCBI BLAST and arbitary sized queries."

A paper from another group expresses similar sentiments:

Of the many versions of BLAST, NCBI BLAST [11] has become a de facto standard. Public access is possible either through download of code or directly through the large web-accessible server at NCBI. This standardization motivates the design criteria for accelerated BLAST codes: users expect not only that performance be significantly upgraded, but also that outputs match exactly those given by the original system.

My question is, is this really that important? My understanding is the initial search in BLASTn is not exhaustive so providing an exhaustive search would be desirable.

blast sequencing • 2.9k views

ADD COMMENT • link updated 13.7 years ago by Hamish ★ 3.3k • written 13.7 years ago by Jcd ▴ 30

0

Entering edit mode

I cannot understand why saying BLAST is an heuristic is a positive argument. I could understand that the heuristic permits a very good compromise between speed and sensitivity/specificity (I use BLAST for that by the way). And, more important, I don't understand the NCBI Blast compatibility matter too ? Computing BIT-score or e-value is fast and easy. An exhaustive search tool would permit to find more hits and then a better sensibility. That would, of course, convince biologists. Can you cite the article please ?

ADD REPLY • link 13.7 years ago by Manu Prestat 4.1k

0

Entering edit mode

@Manu As I understand it they're saying that their program is better than BLAST because their program is not heuristic... although it's difficult to understand exactly what they mean with such a small quote.

@JCD please provide the paper you got this from, it's confusing what you mean.

ADD REPLY • link 13.7 years ago by Niek De Klein ★ 2.6k

0

Entering edit mode

@Niek de Klein The first paper seems to consider matching BLASTn a design requirement, but points out other implementations do not. Some of them are significantly faster.

ADD REPLY • link 13.7 years ago by Jcd ▴ 30

score 1 · Answer 1 · 2012-02-07

Looking at these papers they both appear to be describing accelerated BLAST implementations, using FPGA hardware to improve performance for critical parts of the algorithm. In both cases producing identical output to the NCBI BLAST implementation would be desirable, since this makes it easer to assess if the acceleration is working correctly since test search results can easily be compared between the different implementations. Since accelerated BLAST implementations are commonly targeted at large scale systematic users, having output that is identical to NCBI BLAST would be desirable since that allows NCBI BLAST to be replaced with an alternative implementation with minimal impact on upstream processing of the BLAST result. This would also allow for maintenance and development of the users analysis pipeline to be able to use NCBI BLAST, which is free and does not require specialist hardware, while the production environment can use the accelerated version where the hardware and software costs can be more easily justified.

There are many tools for sequence similarity searching which perform more rigorous searches. These generally implement the Smith-Waterman local alignment or Needleman-Wunsch global alignment algorithms. There have been many implementations of these methods over the years and little standardisation in the output formats. That said the commonly used reference implementation for Smith-Waterman is SSEARCH, part of the FASTA suite. For Needleman-Wunsch I'm aware of only a couple of implementations for sequence searching, of with GGSEARCH from the FASTA suite is the probably the best known.

score 0 · Answer 2 · 2011-11-23

0

Entering edit mode

13.7 years ago

Damian Kao 16k

Do you mean compatibility with BLAST in terms of identical alignment results? Or compatibility with BLASt in terms of same type of metrics (bit score, e-vale..)?

I think for non-computational biologists, it is important to be able to compare new alignment implementations to their past BLAST results, so it would be a lot more useful for any new implementations to also output the same type of metrics.

If the new implementation is somehow more accurate, I don't think biologists would mind if the alignment results are not identical.