Question

Rare Variant Association Analysis

10

Entering edit mode

13.4 years ago

Adrian Cortes ▴ 550

Hello all,

There has been a lot of discussion on rare variants (loosely defined as <1% population frequency) and its effect on complex traits in the genetics literature lately. A new paradigm that a lot of labs are pursuing at the moment with their favorite traits is to use NGS to identify novel variants and then take these variants on a genotyping ride with larger cohorts. Because we have less statistical power with rare variants there are a lot of different approaches proposed to aggregate variants for association analysis (as opposed to the GWAS-type analysis). These methods were nicely reviewed here:

http://www.ncbi.nlm.nih.gov/pubmed/20940738

So I have gone through the hurdle of sequencing a collection of samples on a region of the genome, selecting variants for genotyping, designing a custom array and genotyped my collection of samples. Now I want to look for rare variant association. The review above contains references to a beautiful collection of methods and approaches so I contacted many of the authors of these methods to ask if they could share their code with me to analyze my data. Well... sorry to say that not many replies I received... So before I start coding away these methods my questions are:

If you have analyzed rare variants, can you recommend me a tool/approach?

Are you aware of any comparison between these approaches?

Many thanks,

gwas variant association • 12k views

ADD COMMENT • link updated 3.1 years ago by Ram 44k • written 13.4 years ago by Adrian Cortes ▴ 550

1

Entering edit mode

Nice question about an interesting field. Unfortunately, it might hard to impossible to get public data on this to simply play around with different methods. It might be very interesting for me and other users to sum up your own experiences with different methods and software in an answer to your own question.

ADD REPLY • link 13.4 years ago by Michael 55k

0

Entering edit mode

Just a small comment... what do mean by rare variants? Below 1% in frequency?

ADD REPLY • link 13.4 years ago by Thomas ▴ 760

0

Entering edit mode

Thanks for your comment. What's the limit between rare and common variant is not well defined to my knowledge but for the purpose of the question lets go with your suggestions of <1%.

ADD REPLY • link 13.4 years ago by Adrian Cortes ▴ 550

0

Entering edit mode

The Nature review you mention states rare variants are "defined by convention as <1% frequency" although frequency "might range from <0.1% to <0.01% depending on the context".

ADD REPLY • link 13.4 years ago by Travis ★ 2.8k

0

Entering edit mode

Are you working with population- or family-based data?

ADD REPLY • link 13.4 years ago by K_Star ▴ 120

Ram · Answer 1 · 2011-10-20

There are more recently review papers (link1, link2) that cover even more methods for rare variant association studies. For example:

single variants
- χ2 test for contingency table
- Fisher’s exact test
- Cochran Armitage test for trends
- others
Multiple variants
- Collapsing methods: such as CMC and CAST
- Aggregation method: WSS and KBAC
- Bi-directional Effect Methods: C-alpha , Logistic Kernel-Machine Test, etc

The authors of the first paper discussed that choosing a method will depends on many factors such as:

Study design (if the trait is quantitative or dichotomous)
The assumption of the underlying genetics (rare variants vs. both rare and common variants are expected to contribute to disease)
Whether protective and risk variants are expected.

They didn't provide a recipe of how to choose a method rather they, rightly, raise the point that most of these methods depend on simulation data with a certain assumptions about genetic architecture of a specific trait. So, they may not be applied to different scenarios.

I've tried KBAC, Fisher's exact test, and few others on my samples (n < 100) which is underpowered for rare variant association. Each test outputs a different list of candidate genes. Without increasing the number of samples in my study, I have to go through the pain of looking for every candidate gene in the top of each list and check for biological clues that might connect it to my phenotype. A lot of false hopes and not fun at all.

IMO, until someone publishes a large scale project to test for rare variants association (number of sample >= 10,000) , I think there isn't currently a favorite method that everyone agrees on. You probably need to try many of the published methods on your data and get a sense of what they can provide.

score 2 · Answer 2 · 2011-07-20

2

Entering edit mode

13.4 years ago

Nathan Nehrt ▴ 250

I have not personally used this tool, but do know of the CCRaVAT (Case-Control Rare Variant Analysis Tool) available for download from Sanger. It implements a collapsing-based approach for rare variant analysis.

ADD COMMENT • link 13.4 years ago by Nathan Nehrt ▴ 250

score 2 · Answer 3 · 2011-07-20

2

Entering edit mode

13.4 years ago

Docroberson ▴ 310

I believe you can do this in PLINK-SEQ:

http://atgu.mgh.harvard.edu/plinkseq/

BUT beware that the last time I checked it out it was [1] still under development and [2] DID NOT yet have any source code available. Probably won't until the full first release and paper to prevent getting scooped. I think it will let you do the C-alpha test on rare variants.

ADD COMMENT • link 13.4 years ago by Docroberson ▴ 310

0

Entering edit mode

We've used PLINK/SEQ for the purpose of comparing rare variant association methods - a subset of the gene/group-based tests have been implemented: http://atgu.mgh.harvard.edu/plinkseq/assoc.shtml

ADD REPLY • link 13.4 years ago by Brett Thomas ▴ 300

0

Entering edit mode

Hello Brett, I tried the different tests available in PLINK/SEQ and got some different results. In particular, I am interested in the two-sided tests but C-alpha gives me no association while SUMSTAT does. I couldn't find a reference to this test. Do you know the difference? Did your results agreed? Thanks

ADD REPLY • link 13.3 years ago by Adrian Cortes ▴ 550

score 2 · Answer 4 · 2013-10-28

2

Entering edit mode

11.2 years ago

Nino ▴ 20

There is an R package called Assoteste (http://cran.r-project.org/web/packages/AssotesteR/AssotesteR.pdf) to perform most of the tests developed to detect rare variants

ADD COMMENT • link 11.2 years ago by Nino ▴ 20

score 2 · Answer 5 · 2014-01-15

2

Entering edit mode

11.0 years ago

ben.bob ▴ 30

You could also try variant tools varianttools.sourceforge.net) which implements more than 20 association test methods for rare variants. It also provides a whole set of tools to perform quality control and other tasks needed for a complete analysis pipeline.

ADD COMMENT • link 11.0 years ago by ben.bob ▴ 30

score 0 · Answer 6 · 2011-07-20

0

Entering edit mode

13.4 years ago

K_Star ▴ 120

Are you working with population- or family-based data?

ADD COMMENT • link 13.4 years ago by K_Star ▴ 120

1

Entering edit mode

An important question indeed - but this is not an answer and more appropriately belongs as a comment under the original question.

ADD REPLY • link 13.4 years ago by Larry_Parnell 16k