Salmon Algorithm Binary Search
1
0
Entering edit mode
6.3 years ago
pablo ▴ 310

Hi guys,

I've got a question about Salmon's mapping algorithm. I read a paper talking about that :

"In SA, we matched successively longer prefixes (left-to-right) of query string (binary search)

In BWT, we will match successively longer suffixes (right-to-left) of query string (reverse BWT transform)"

So, for Salmon's algorithm which uses a SA, it is doing a binary search? I get confused when I read that

Best, Vincent

salmon • 1.5k views
ADD COMMENT
0
Entering edit mode

tagging: Rob

ADD REPLY
3
Entering edit mode
6.3 years ago
Rob 6.9k

Hi Vincent,

The answer to your question is yes --- the mapping algorithm uses binary search --- but with two very important twists. First, it's important to note that the mapping algorithm augments the suffix array with a hash table. This hash table stores, for each k-mer, the _interval_ of the suffix array where suffixes beginning with this k-mer as a prefix begin. Recall that since suffixes in the SA are sorted lexicographically, all suffixes starting with a given k-mer will be contiguous in the SA. This means that instead of the binary search being O(M lg N), it is O(M lg W) where W is the width of this suffix array interval. For sufficiently long k-mers, this leads to intervals which typically have a very short width. Second, the mapping algorithm uses the "simple accelerant" (as suggested in Gusfield), which remembers the size of the prefix which must already be matched as we conduct the search --- this leads to a practical algorithm that works like O(M + lg W) in most cases (though, we don't use the LCP array so it's technically O(M lg W) in the worst case). The main benefit of using the uncompressed suffix array like this is that these modifications make search very fast, and since the suffix array is uncompressed, both count and locate queries are extremely efficient.

ADD COMMENT
0
Entering edit mode

Thanks for answering me

So, Salmon doest not use a BWT algorithm, and just uses an uncompressed SA ? Whereas HISAT uses a SA to build a BWT matrix to search pattern ?

ADD REPLY
0
Entering edit mode

Salmon (in mapping based mode), uses an uncompressed SA aided by a hash table. HISAT uses a hierarchical FM-index (itself based on the BWT and the suffix array).

ADD REPLY

Login before adding your answer.

Traffic: 1965 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6