How to calculate the occurrence of a stretch of nucleotides in a genome?
0
0
Entering edit mode
3.9 years ago

Is there a simple formula to calculate the probability of finding a given sequence of nucleotides in a target sequence? I have seen this formula:

a = (g/2)^G+C × ((1-g)/2)^A+T,

where:

a = probability
g = G+C content of the target genome
C+G = number of G and C in the stretch
A+T = number of A and T in the stretch.

I tried to calculate the occurrence of a primer based targeting E. coli: GTGTCCATTTATACGGACATCCATG as follows. The GC content of E. coli is 50.8%, thus:

a = (0.58/2)^11 × (0.42/2+)^14 = 1.22×10^-6 * 3.24×10^-10 = 3.95×10^-16

and the number of occurrences is:

n = 3.95×10^-16 × 16*10^6 = 6.32×10-9

Looks to me, that the primer should not occur at all in the E. coli genome (which is OK for a primer given that it should be present at the most once in a genome). Is the formula correct? Or is there a simpler one that does not require the power of dozen to be solved (here I had to use R to get an answer because a scientific calculator could not handle it...)?

Thank you.

genome • 676 views
ADD COMMENT

Login before adding your answer.

Traffic: 1710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6