Comparative Genomic Analysis Of Bacterial Field Strains
1
2
Entering edit mode
13.1 years ago
Sanjukta ▴ 20

Hi, I have a question on comparative genomics of bacterial field strains.

If there is a very low number (only 30) of variants (base substitutions, indels etc) in a comparative genomics study of several field strains of bacteria over a long period of time along with several (major) gains/losses of genes, how should be the result interpreted?

Thanks

comparative indel snp • 3.7k views
ADD COMMENT
2
Entering edit mode

I'm never convinced that the term "SNP" makes much sense in the context of bacterial genomes. Bacterial genomes alter with every round of replication - are there really "reference genomes" in the same sense as eukaryotes? Just a thought.

ADD REPLY
0
Entering edit mode

Neil, I think Sanjukta should have used the more neutral term variant/mutation or base substitution. I agree that the term SNP doesn't seem to fit here.

ADD REPLY
0
Entering edit mode

I cleaned this up, I hope it is better now.

ADD REPLY
2
Entering edit mode
13.1 years ago
Michael 55k

I would as a first point consider that your analysis could be flawed, because that number of mutations seems very low. Your sequencing needs sufficient coverage to detect variants, and you might have missed a lot.

So question 1 back to you: How high is your coverage, and was it re-sequencing, how was the variant calling carried out?

Question 2: what do you mean by a "long period of time", please specify the period of time since separation (I mean in years, days, hours, whatever)

If the data was real, that could mean one of few things:

  • The mutation rate is low or
  • the proportion of neutral to constrained sites is small
  • the generation time is very long
  • the populations are not as 'cleanly separated' as you might think, they could mix.

Just a very naive calculation, which doesn't claim to be precise or correct at all.

If most changes seen during molecular evolution are neutral, then fixations in a population will accumulate at a clock-rate that is equal to the rate of neutral mutations in an individual.

Say the avarage mutation rate is 1E-8 per base per generation (that's about the value which is often used), let the genome size be 3 Mbase, then the expected value of mutations per generation is 1E-8 * p * 3E6 ~ p * 0.03 (p: estimated proportion of neutral/constrained sites). At that rate it would take only like 30/p*0.03 ~ 1000/p generations to generate these mutations. Given I set p to only 10%-50% (range depending on estimate of the constrained (coding+promoter+RNA vs. non-constrained), it would take between 2000-10000 generations to get this number of mutations.

ADD COMMENT
1
Entering edit mode

At a minimum, you'd need to estimate the proportion of sites under constraint (which could be >50% in a bacterial genome -- ~2/3 of coding sites which occupy 2/3 of genome + ~1/3 of noncoding sites) and subtract these sites from those free to generate variants. All of this is moot really without more information on the timescale involved here, as you note above in your answer.

ADD REPLY
0
Entering edit mode

@Michael - your calculation assumes that all sites in the genome are free to mutate at the same rate as unconstrained silent sites. Constrained coding and noncoding sites will have lower levels of neutral variation than silent sites, so you can't extrapolate over the entire genome so easily.

ADD REPLY
0
Entering edit mode

Casey, that's why I wote, this calculation is 'aproximate' (I called it 'naive' even). I see what you mean, but isn't that calculation anyway leading to the correct assumption that number of mutations is relatively low? What would be a better calculation then?

ADD REPLY
0
Entering edit mode

Thank you Casey. I edited my calculation a bit to reflect this, but I agree it is still a very rough calculation.

ADD REPLY

Login before adding your answer.

Traffic: 2512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6