Hello all,
There has been a lot of discussion on rare variants (loosely defined as <1% population frequency) and its effect on complex traits in the genetics literature lately. A new paradigm that a lot of labs are pursuing at the moment with their favorite traits is to use NGS to identify novel variants and then take these variants on a genotyping ride with larger cohorts. Because we have less statistical power with rare variants there are a lot of different approaches proposed to aggregate variants for association analysis (as opposed to the GWAS-type analysis). These methods were nicely reviewed here:
http://www.ncbi.nlm.nih.gov/pubmed/20940738
So I have gone through the hurdle of sequencing a collection of samples on a region of the genome, selecting variants for genotyping, designing a custom array and genotyped my collection of samples. Now I want to look for rare variant association. The review above contains references to a beautiful collection of methods and approaches so I contacted many of the authors of these methods to ask if they could share their code with me to analyze my data. Well... sorry to say that not many replies I received... So before I start coding away these methods my questions are:
If you have analyzed rare variants, can you recommend me a tool/approach?
Are you aware of any comparison between these approaches?
Many thanks,
Nice question about an interesting field. Unfortunately, it might hard to impossible to get public data on this to simply play around with different methods. It might be very interesting for me and other users to sum up your own experiences with different methods and software in an answer to your own question.
Just a small comment... what do mean by rare variants? Below 1% in frequency?
Thanks for your comment. What's the limit between rare and common variant is not well defined to my knowledge but for the purpose of the question lets go with your suggestions of <1%.
The Nature review you mention states rare variants are "defined by convention as <1% frequency" although frequency "might range from <0.1% to <0.01% depending on the context".
Are you working with population- or family-based data?