How Much Of Genome Is Captured By A Gwas?
3
7
Entering edit mode
13.4 years ago
K_Star ▴ 120

How much of the genome is 'captured' in a GWAS with 300k, 500k or 1,000k SNPs? And where are most of the tagging SNPs located? Are they mostly in the exome?

gwas • 5.7k views
ADD COMMENT
1
Entering edit mode

What genome ????

ADD REPLY
0
Entering edit mode

Yes, an important point as GWAS are conducted in human and non-human species. Plant GWAS are really cool for example.

ADD REPLY
0
Entering edit mode

Sorry, Human genome.

ADD REPLY
8
Entering edit mode
13.4 years ago

The experimental basis of GWAS is genotyping. SNP genotyping enables rapid scanning of .3M, 0.5M or 1M genetic markers (or SNPs) to find genetic variations associated with complex diseases or traits. GWAS deals with a large number of markers and large number of subjects to get reliable signal and associations should be of high significance. For a detailed overview of recent advances in GWAS refer to another discussion here.

How much of the genome is 'captured' in a GWAS with 300k, 500k or 1,000k SNPs?

Human genome encodes 1 SNP/100-300bp; ~3GB sequence ~10million SNPs. It is impossible to analyze such a large number of data due to several limiting factors. To deal with this issue we can use Linkage Disequilibrium (LD) mapping (See section on D', recombination rate), Haplotype, Haplotype blocks and Haplotype Tag SNPs (tagSNPs). (Read about HapMap project here). Instead of genotyping all the 10M SNPs we can genotype tagSNPs in a haplotype block. This is a representative SNP in a given region of genome with high LD. This will enable to find genetic variation without genotyping all the 10M SNPs. Previous studies indicated that genotyping chips with .5M-1M SNPs will be sufficient for a good GWAS.

And where are most of the tagging SNPs located?

Basic assumption here is the genotyped SNPs must cover all LDs. You can get further details on Illumina Human 660W-Quadv1_A or Affymetrix 500K Gene Chip

Are they mostly in the exome?

No. TaggingSNP selection is not biased towards exome. Most of GWAS hits are in intergenic / promoter or distal regions from exons.

ADD COMMENT
0
Entering edit mode

Thank you Larry, and in particular, Khader for the informative responses.

The answer that I am looking for then is, how many of the estimated 10 millions SNPs are captured using each of the aforementioned SNP arrays, say for example in a CEU cohort.

ADD REPLY
5
Entering edit mode
13.4 years ago

Excellent overview provided by Khader. I'll add a couple points:

Different platforms capture LD SNPs better than others. Illumina is better in this regard, but the new version of from Affy makes up for this deficiency. Size, too, matters - more SNPs give better LD coverage.

Population differences. Some populations will not be as well interrogated by available arrays as other populations. This is so because many polymorphic sites in one population may not be variable in another population, or at so low frequency as not to be included on the array. This is not a huge problem, but can be important for some genomic regions. Think of the extreme: SNPs private to my family are not likely to be on any array because they have not been seen before.

Another way to word your question: Of all LD blocks defined by r^ = 1.0 (or 0.9 or 0.8, etc) and containing n SNPs (where n > 0, or n > 1 or...), how many of those LD blocks are represented on an array? That's a tough question and is dependent on the population under study. We do GWAS and study several different populations and have not put the effort into this calculation. To us, it is not a high priority because we use the platforms and data we have, engage in careful analysis, and report our findings. If a more complete array or analysis comes along later, so be it.

ADD COMMENT
0
Entering edit mode

Thanks for these important points Larry.

ADD REPLY
0
Entering edit mode
13.4 years ago
K_Star ▴ 120

Thank you Larry, and in particular, Khader for the informative responses.

The answer that I am looking for then is, how many of the estimated 10 millions SNPs are captured using each of the aforementioned SNP arrays, say for example in a CEU cohort.

ADD COMMENT
0
Entering edit mode

k_star, please add this as a comment to your question / respective answer for further discussion.

ADD REPLY

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6