I recently had a very similar question with a motif that I was working on, and I took two approaches, one analytical and one empirical way:
First approach:
I did almost what you suggested. In addition, I also took the base composition of the scanned sequences into account. E.g, if there are 30% As, the probability to see an A at a specific position is 0.3. The probability of a 7-mer like AATGCCA would be:
(prob. of A) ^ 3 * (prob. of C) ^ 2 * (prob. of T) ^ 1 * (prob. of G) ^ 1
This probability times the number of scanned windows (10000 – 6 for 10kB) gives the number of occurrences that we would expect just by chance.
What I did not like about this approach is that I assume the scanned windows to be independent from each other, which they are not, because each window overlaps by 6 bases with the neighboring windows.
Second approach:
I shuffled the nucleotides of the scanned sequence many times, each time scanning the randomly generated sequence and counting the occurrences of my motif. On average, this gave me exactly the result that I predicted with my first approach. I guess that the dependence of the windows is not of practical relevance for long sequences.
In addition, one could calculate something like a p-value for the observed occurrences, asking: What is the probability to find the motif at least as often as we did just by chance. In terms of my first approach, one would use a Poisson distribution to do this, using the calculated probability of the k-mer and the number of scanned windows as n. In terms of the second approach, one would just see in how many cases the motif was found as often as actually observed or more in the shuffled sequences. For example: You found the motif twice in your scanned sequence. After shuffling the sequence 1000 times, in 30 cases you found the motif twice or even more often than twice. This would give you a p-value of 0.03.
I got the same problem but little bit vary. suppose I have a motif sequence as GATAAAG. sequence length is 1000 bp. this is a 7-mer. so as you descripe i can get how many expected occurrence of such 7-mer within 1000bp. this includes all possible combination of expected counts. but if I need to count expected value of exact same order of sequence as mention above how do we modify this calculation?