Hello.
I'm wondering if there is any way to determine if the frequency of a specific k-mer (inside a specific length sequence) is significant statistically.
For example, let's say that we have the following sequence:
ATAGATCATAGATAGATGGAGTTACT
the 5-mer ATAGA has a frequency of value 3.
1) How can we determine which is the probability of this 5-mer to be appeared 3 times in that specific sequence ? 2) Is this probability statistically significant ? Could this probability, probable means something ?
I'm not looking for ready R libraries that might be possible to calculate these but for mathematical/statistic models/ideas to approach it.
Thank you