Snp Distribution On A Chromosome
4
5
Entering edit mode
14.1 years ago
Andrea_Bio ★ 2.8k

Hi

In this question I'm just asking for a really rough guestimate to check some analysis I am running.

If you had a chromsome that contained about 11000 exons and 600,000 SNPs, how many of those SNPs would you expect to fall in an exon.

I only found 124 in exons

Any logic you use to make your guess would be useful to know

Many thanks

snp • 4.7k views
ADD COMMENT
9
Entering edit mode
14.1 years ago
brentp 24k

If the SNPS are distributed randomly, then the expected number of SNPS in exons would be (with all numbers per-chromosome):

(base-pairs-of-exon / total-base-pairs) * total-SNPS
ADD COMMENT
1
Entering edit mode

although this is a very simple way of calculating the raw expected amount of SNPs on a region, it does not consider many factors that can modify the resulting figures. among others, exons are less prone to variability, so the SNP ratio depending only on their base length should be less than intronic regions, and even less than intergenic regions. for that reason I would say that the figure you are considering (124 versus 175) is definitely consistent.

ADD REPLY
0
Entering edit mode

brilliant, this predicted 175 and I got 124.

ADD REPLY
0
Entering edit mode

good point. it only holds for randomly distributed snps. see @Eric's comment to @Istvan's answer as well.

ADD REPLY
5
Entering edit mode
14.1 years ago

Using Brent's formula you should probably compute the expected values that fall both on an exon and outside of them. These would be the expected counts. Then compute these counts for the observed data. Finally use chi square test to compute the p-value that tells you whether the expected counts are different from the observed ones.

ADD COMMENT
0
Entering edit mode

good idea. it also depends on what's the actual coverage for exon vs. non-exon which will affect the ability to make the SNP calls. you might expect non-exon to have more simple sequence and therefore be less likely to have good enough coverage to do proper SNP calling.

ADD REPLY
0
Entering edit mode

I don't know if that overlap with your suggestion @brentp, but I would expect non-exons, under an assumption of relaxed selection pressures, to contain more variation than exons, thus boosting the proportion of SNPs in non-exon portions of the chromosome. Possible explanation for the less-than expected proportion of exon-SNPs?

ADD REPLY
0
Entering edit mode

@Istvan - are there any tools that will automatically calculate this for you? I would be especially interested in any tools that take into account factors mentioned by Eric about election pressure.

ADD REPLY
0
Entering edit mode

@Andrea_bio I am not aware of a tool that would compute this for you right away but it is likely that it exists in one form or another - I would imagine that this question is fairly common

ADD REPLY
4
Entering edit mode
14.1 years ago
Bio_X2Y ★ 4.4k

I've just read a paper describing the DNA sequencing of an Irish individual (11x coverage), and the paper's Table 2 provides a breakdown of the locations of SNPs that they found (comparing their genome to the human reference genome).

http://genomebiology.com/2010/11/9/R91

Table 2:

  • Essential_splice_site - 0.0043%
  • Stop_gained - 0.0034%
  • Stop_lost - 0.0007%
  • Non_synonymous_coding - 0.3263%
  • Splice_site - 0.064%
  • Synonymous_coding - 0.3129%
  • Within_mature_mirna - 0.001%
  • Within_non_coding_gene - 0.5282%
  • 5prime_utr - 0.1471%
  • 3prime_utr - 0.6283%
  • Intronic - 34.6666%
  • Other - 63.317% (clarified in the paper as being Intergenic)

If you consider the first 10 categories to be "exon SNPs", then 2.02% of SNPs occur within exons.

That implies they saw 10 times more SNPs in exons than you did, assuming I'm comparing like with like!

ADD COMMENT
1
Entering edit mode
14.1 years ago
Ketil 4.1k

I think it's fairly obvious that you would find fewer SNPs in exons than in introns and intragenic regions, after all, genes tend to be conserved. As brent suggests, the sizes are what matters.

you could use the data from bio_X2Y and adapt them to the sizes of your genome, but I expect the actual numbers to depend on a lot of factors: population size, evolutionary pressures, mutation rate, type of organism...

ADD COMMENT
0
Entering edit mode

I asked this question in another comment below but I hadn't seen your answer then. Is there any software for calculating SNP distribution that takes into account the factors you have mentioned such as mutation rate, population size, organism. I would be very interested in this.

ADD REPLY

Login before adding your answer.

Traffic: 1633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6