Hi -
So I'm just wondering how exactly the BWT works in creating "runs" of the same character in the last column after sorting the cycled string. Is it because of some frequency of nucleotides occurring conditionally that I'm not aware of? I get that in the English language I'm pretty sure that there are cases where a character is more likely to appear after another character (that's how BWT's effectiveness is explained in all texts i've found so far), but does this also apply to nucleotides?
Also, a 4 symbol alphabet helps a lot in keeping data structures small.