There is a nice review, published just today, resuming some numbers about genetic variability in the human genome:
Since there has been a discussion on this matter in this thread, I'll resume some of these numbers here. The paper contains many references, which I am too lazy to copy here... So, if you want to know more, just read the paper.
It is important to note that there is only slightly more diversity between individuals of two major continental group, than between individuals of the same population. Craig Venter and Jim Watson (both "caucasian") share less SNVs between themselves that either of them shares with Seong-Jin Kim (a korean scientist).
General Chimp/Human differences:
- number of single-nucleotide differences between chimp and human: ~35 millions (+5 milliion insertion/deletion events)
- percentage of single-nucleotide changes between chimp and human: 1.23%
- percentage of single-nucleotide changes that are fixed in chimp or in human: 1.06% over 1.23%
Variability between individuals:
- number of SNVs in humans: ~65 millions
- on average, a pair of humans is expected to differ on 1 base every 1000
Fst, pre-1000 Genomes (Fst is a measure of genetic differentiation; 0 -> no differentiation between individuals; 1->highest genetic differentiation):
- average Fst among major continental group ranges from 0.05 to 0.13
- genetic diversity in humans is far lower than in other primates. Genetic variance in humans is only 5-13% of the variance in other primates.
Fst, after 1000 Genomes
- Fst between African and Europeans: 0.071
- Fst between African and Asians: 0.083
- Fst between Asians and Europeans: 0.052
- Fst in Gorilla populations: 0.38
- Fst in Chimp: 0.32
Allele sharing across continents
- 81.2% of SNVs are present in all continental groups (12.4% if we consider haplotypes instead of SNVs)
- less than 1% SNVs are specific to a continent (11% if we consider haplotypes instead of SNVs)
- only 0.06% of SNVs are specific to Eurasia
Haplotype Sharing across continents
- 2% haplotype blocks restricted to Asia
- 2% haplotype blocks restricted to Europe
- 25% haplotype blocks restricted to Africa
Major Genetic Groups
- According to Rosenberg et al (the Structure software), all human individuals can be classified into 5 continental groups
- Li et al confirmed the same, on the HGDP panel
- however, recent attempts failed to confirm this classification, claiming that it may be due to confounding effects.
- races according to the US census system: 15 plus "other races". I recommend you to read this book if you are interested on the matter of races in scientific use
"3": Never write code after 3 beers.
Correct. Two is optimal. Relevant xkcd: http://xkcd.com/323/
At this point, there's always a relevant xkcd :)
Might be some relevant numbers here: Bioinformatics "Cheat Sheet"
Might want to use the "cost" price of sequencing technologies, because prices vary significantly according to country/continent/provider etc (cost price for a HiSeq lane is quite a bit lower than $2,500)
we should ask this question every 1-2 years, and see how these numbers change
Is 350 Mreads realistic for HiSeq? My lab usually counts on 200 Mreads, but we also do lots of different kinds of samples, so we can't count on every library behaving completely consistently, which maybe some groups can.
I wonder if it refers to single-end reads - I thought Illumina's reference value was a minimum of somewhere around 140M paired end reads?! (even though you sometimes get much more, of course)
I'll look around and then update. I'm not confident on those numbers. I would call 140M read-pairs 280M reads.
42 for geeky jokes on parties nobody thinks funny