Hi all,
I am trying to locate patterns using suffix array. Thus, I have implemented a plaint suffix array, which searches patterns exploiting binary search. In small reference genomes like caenorhabditis elegans, it ran fast. However, it was extremely slow in human genome. Searching a pattern of length 20 needs more than 1 minutes.
I know binary search in large genomes will incur more cache misses, but I think one minute per pattern is abnormal. Please help me to solve this problem, or introduce an implement of suffix array that is suited for large genome.
Thanks