Hi all, I'm playing with an alignment algorithm, that should look like BLAT (I think).
From the Reference sequence, I've sorted all the words of size=K
AAAAAAACT index in ref=1901
AAAAAAACT 2504
AAAAAACTT 2505
(...)
TTTTTTACC 189
TTTTTTTTA 87
when we want to map a sequence on the chromosome, the query is also indexed
AAAAAAACG 20
AAAAAACGA 21
(...)
TTAATGGAG 4
so, the words of the query can be quickly mapped to the reference using a binary-search. At the end, I got an array of hits:
struct Hit
{
int position_in_ref;
int position_in_query;
int word=K;
}
that should look like this (the figure is from the BLAT paper)
Now, how should I use this information to get the best alignment(s). I suspect a simple solution should be something like a dynamic algorithm but I'm lost at this point. Any suggestion ?
by "alignments", do you mean gapped or ungapped?
gapped alignments