I'm reading papers which develop approaches to look for rare variants, esp. since GWAS has failed to explain the "missing heritability". Surprisingly, though it's still far away from being able to afford whole-genome sequencing over large number of samples; there have been numerous statistics/approaches developed for rare variants using sequencing data.
Some of my readings are as follows:
http://www.ncbi.nlm.nih.gov/pubmed/18691683 (quite early and influential: combined multivariant and collapsing; that is first collapse those rare variants after bin them according to allele frequency; then apply multivariant-testing; to make use of the power of two approaches)
http://www.ncbi.nlm.nih.gov/pubmed/19214210 (kind of set a weigh for each rare variant according to frequency)
http://www.ncbi.nlm.nih.gov/pubmed/19810025 (use multiple regression model; phenotype dependent variable vs collapsing rare variant independent variable)
http://www.ncbi.nlm.nih.gov/pubmed/21521787 (more recently, calculate functional principal component analysis)
http://www.ncbi.nlm.nih.gov/pubmed/22262732 (more recently, take account sequencing quality)
Since large-scale sequencing data is not available, most of them just use simulation data or sequencing data around certain genes. I'm just wondering can anyone introduce any experience of using all these approaches? Which one may be the best? It's very confusing and scary to beginners.... Also, we can have some discussions about how to improve such approaches when it comes to sequencing data. For example, compared to SNP array those common variants, we need to seriously take account sequencing errors. (That's why the last two approaches come out)
You should read this paper of Eric Lander http://www.pnas.org/content/early/2012/01/04/1119675109 Where the conclusion is that the "missing heritability" are overestimated. But I still think is very important use the new NGS data to study rare variants.
I was wondering if anyone recalls a paper that reviewed all of the major existing collapsing methods. I think it was published in Nature Reviews Genetics, but I've been unable to find it. Any help?
I guess you're referring to: "Statistical analysis strategies for association studies involving rare variants", http://www.nature.com/nrg/journal/v11/n11/full/nrg2867.html