I've read a paper demonstrating an implementation of sequence alignment and the authors indicate that in order to provide substantial speed gains they must compromise on query size and output fidelity, relative to BLASTn.
"While one could argue that BLAST is a
heuristic and 100% compatibility with
NCBI BLAST is unnecessary, it is
difficult to convince
biologists....Thus, in addition to
just “better performance” there are
still signficant challenges including
complete compatibility with NCBI BLAST
and arbitary sized queries."
A paper from another group expresses similar sentiments:
Of the many versions of BLAST, NCBI
BLAST [11] has become a de facto
standard. Public access is possible
either through download of code or
directly through the large
web-accessible server at NCBI. This
standardization motivates the design
criteria for accelerated BLAST codes:
users expect not only that performance
be significantly upgraded, but also
that outputs match exactly those given
by the original system.
My question is, is this really that important? My understanding is the initial search in BLASTn is not exhaustive so providing an exhaustive search would be desirable.
ADD COMMENT
• link
updated 13.1 years ago by
Hamish
★
3.3k
•
written 13.1 years ago by
Jcd
▴
30
0
Entering edit mode
I cannot understand why saying BLAST is an heuristic is a positive argument. I could understand that the heuristic permits a very good compromise between speed and sensitivity/specificity (I use BLAST for that by the way). And, more important, I don't understand the NCBI Blast compatibility matter too ? Computing BIT-score or e-value is fast and easy. An exhaustive search tool would permit to find more hits and then a better sensibility. That would, of course, convince biologists.
Can you cite the article please ?
@Manu As I understand it they're saying that their program is better than BLAST because their program is not heuristic... although it's difficult to understand exactly what they mean with such a small quote.
@JCD please provide the paper you got this from, it's confusing what you mean.
@Niek de Klein The first paper seems to consider matching BLASTn a design requirement, but points out other implementations do not. Some of them are significantly faster.
Looking at these papers they both appear to be describing accelerated BLAST implementations, using FPGA hardware to improve performance for critical parts of the algorithm. In both cases producing identical output to the NCBI BLAST implementation would be desirable, since this makes it easer to assess if the acceleration is working correctly since test search results can easily be compared between the different implementations. Since accelerated BLAST implementations are commonly targeted at large scale systematic users, having output that is identical to NCBI BLAST would be desirable since that allows NCBI BLAST to be replaced with an alternative implementation with minimal impact on upstream processing of the BLAST result. This would also allow for maintenance and development of the users analysis pipeline to be able to use NCBI BLAST, which is free and does not require specialist hardware, while the production environment can use the accelerated version where the hardware and software costs can be more easily justified.
There are many tools for sequence similarity searching which perform more rigorous searches. These generally implement the Smith-Waterman local alignment or Needleman-Wunsch global alignment algorithms. There have been many implementations of these methods over the years and little standardisation in the output formats. That said the commonly used reference implementation for Smith-Waterman is SSEARCH, part of the FASTA suite. For Needleman-Wunsch I'm aware of only a couple of implementations for sequence searching, of with GGSEARCH from the FASTA suite is the probably the best known.
Do you mean compatibility with BLAST in terms of identical alignment results? Or compatibility with BLASt in terms of same type of metrics (bit score, e-vale..)?
I think for non-computational biologists, it is important to be able to compare new alignment implementations to their past BLAST results, so it would be a lot more useful for any new implementations to also output the same type of metrics.
If the new implementation is somehow more accurate, I don't think biologists would mind if the alignment results are not identical.
I cannot understand why saying BLAST is an heuristic is a positive argument. I could understand that the heuristic permits a very good compromise between speed and sensitivity/specificity (I use BLAST for that by the way). And, more important, I don't understand the NCBI Blast compatibility matter too ? Computing BIT-score or e-value is fast and easy. An exhaustive search tool would permit to find more hits and then a better sensibility. That would, of course, convince biologists. Can you cite the article please ?
@Manu As I understand it they're saying that their program is better than BLAST because their program is not heuristic... although it's difficult to understand exactly what they mean with such a small quote.
@JCD please provide the paper you got this from, it's confusing what you mean.
@Niek de Klein The first paper seems to consider matching BLASTn a design requirement, but points out other implementations do not. Some of them are significantly faster.