Question

How to calculate the occurrence of a stretch of nucleotides in a genome?

0

Entering edit mode

3.9 years ago

marongiu.luigi ▴ 730

Is there a simple formula to calculate the probability of finding a given sequence of nucleotides in a target sequence? I have seen this formula:

a = (g/2)^G+C × ((1-g)/2)^A+T,

where:

a = probability
g = G+C content of the target genome
C+G = number of G and C in the stretch
A+T = number of A and T in the stretch.

I tried to calculate the occurrence of a primer based targeting E. coli: GTGTCCATTTATACGGACATCCATG as follows. The GC content of E. coli is 50.8%, thus:

a = (0.58/2)^11 × (0.42/2+)^14 = 1.22×10^-6 * 3.24×10^-10 = 3.95×10^-16

and the number of occurrences is:

n = 3.95×10^-16 × 16*10^6 = 6.32×10-9

Looks to me, that the primer should not occur at all in the E. coli genome (which is OK for a primer given that it should be present at the most once in a genome). Is the formula correct? Or is there a simpler one that does not require the power of dozen to be solved (here I had to use R to get an answer because a scientific calculator could not handle it...)?

Thank you.

genome • 676 views

ADD COMMENT • link 3.9 years ago by marongiu.luigi ▴ 730