Rare Variant Association Analysis
6
10
Entering edit mode
13.4 years ago
Adrian Cortes ▴ 550

Hello all,

There has been a lot of discussion on rare variants (loosely defined as <1% population frequency) and its effect on complex traits in the genetics literature lately. A new paradigm that a lot of labs are pursuing at the moment with their favorite traits is to use NGS to identify novel variants and then take these variants on a genotyping ride with larger cohorts. Because we have less statistical power with rare variants there are a lot of different approaches proposed to aggregate variants for association analysis (as opposed to the GWAS-type analysis). These methods were nicely reviewed here:

http://www.ncbi.nlm.nih.gov/pubmed/20940738

So I have gone through the hurdle of sequencing a collection of samples on a region of the genome, selecting variants for genotyping, designing a custom array and genotyped my collection of samples. Now I want to look for rare variant association. The review above contains references to a beautiful collection of methods and approaches so I contacted many of the authors of these methods to ask if they could share their code with me to analyze my data. Well... sorry to say that not many replies I received... So before I start coding away these methods my questions are:

If you have analyzed rare variants, can you recommend me a tool/approach?

Are you aware of any comparison between these approaches?

Many thanks,

gwas variant association • 12k views
ADD COMMENT
1
Entering edit mode

Nice question about an interesting field. Unfortunately, it might hard to impossible to get public data on this to simply play around with different methods. It might be very interesting for me and other users to sum up your own experiences with different methods and software in an answer to your own question.

ADD REPLY
0
Entering edit mode

Just a small comment... what do mean by rare variants? Below 1% in frequency?

ADD REPLY
0
Entering edit mode

Thanks for your comment. What's the limit between rare and common variant is not well defined to my knowledge but for the purpose of the question lets go with your suggestions of <1%.

ADD REPLY
0
Entering edit mode

The Nature review you mention states rare variants are "defined by convention as <1% frequency" although frequency "might range from <0.1% to <0.01% depending on the context".

ADD REPLY
0
Entering edit mode

Are you working with population- or family-based data?

ADD REPLY
7
Entering edit mode
13.2 years ago
sa9 ▴ 870

There are more recently review papers (link1, link2) that cover even more methods for rare variant association studies. For example:

  • single variants
    • χ2 test for contingency table
    • Fisher’s exact test
    • Cochran Armitage test for trends
    • others
  • Multiple variants
    • Collapsing methods: such as CMC and CAST
    • Aggregation method: WSS and KBAC
    • Bi-directional Effect Methods: C-alpha , Logistic Kernel-Machine Test, etc

The authors of the first paper discussed that choosing a method will depends on many factors such as:

  • Study design (if the trait is quantitative or dichotomous)
  • The assumption of the underlying genetics (rare variants vs. both rare and common variants are expected to contribute to disease)
  • Whether protective and risk variants are expected.

They didn't provide a recipe of how to choose a method rather they, rightly, raise the point that most of these methods depend on simulation data with a certain assumptions about genetic architecture of a specific trait. So, they may not be applied to different scenarios.

I've tried KBAC, Fisher's exact test, and few others on my samples (n < 100) which is underpowered for rare variant association. Each test outputs a different list of candidate genes. Without increasing the number of samples in my study, I have to go through the pain of looking for every candidate gene in the top of each list and check for biological clues that might connect it to my phenotype. A lot of false hopes and not fun at all.

IMO, until someone publishes a large scale project to test for rare variants association (number of sample >= 10,000) , I think there isn't currently a favorite method that everyone agrees on. You probably need to try many of the published methods on your data and get a sense of what they can provide.

ADD COMMENT
0
Entering edit mode

Thanks for your answer and for those two references.

ADD REPLY
0
Entering edit mode

A more recent (2014) review for any one interested in this topic http://www.cell.com/ajhg/abstract/S0002-9297%2814%2900271-7

ADD REPLY
2
Entering edit mode
13.4 years ago
Nathan Nehrt ▴ 250

I have not personally used this tool, but do know of the CCRaVAT (Case-Control Rare Variant Analysis Tool) available for download from Sanger. It implements a collapsing-based approach for rare variant analysis.

ADD COMMENT
2
Entering edit mode
13.4 years ago
Docroberson ▴ 310

I believe you can do this in PLINK-SEQ:

http://atgu.mgh.harvard.edu/plinkseq/

BUT beware that the last time I checked it out it was [1] still under development and [2] DID NOT yet have any source code available. Probably won't until the full first release and paper to prevent getting scooped. I think it will let you do the C-alpha test on rare variants.

ADD COMMENT
0
Entering edit mode

We've used PLINK/SEQ for the purpose of comparing rare variant association methods - a subset of the gene/group-based tests have been implemented: http://atgu.mgh.harvard.edu/plinkseq/assoc.shtml

ADD REPLY
0
Entering edit mode

Hello Brett, I tried the different tests available in PLINK/SEQ and got some different results. In particular, I am interested in the two-sided tests but C-alpha gives me no association while SUMSTAT does. I couldn't find a reference to this test. Do you know the difference? Did your results agreed? Thanks

ADD REPLY
2
Entering edit mode
11.2 years ago
Nino ▴ 20

There is an R package called Assoteste (http://cran.r-project.org/web/packages/AssotesteR/AssotesteR.pdf) to perform most of the tests developed to detect rare variants

ADD COMMENT
2
Entering edit mode
11.0 years ago
ben.bob ▴ 30

You could also try variant tools varianttools.sourceforge.net) which implements more than 20 association test methods for rare variants. It also provides a whole set of tools to perform quality control and other tasks needed for a complete analysis pipeline.

ADD COMMENT
0
Entering edit mode
13.4 years ago
K_Star ▴ 120

Are you working with population- or family-based data?

ADD COMMENT
1
Entering edit mode

An important question indeed - but this is not an answer and more appropriately belongs as a comment under the original question.

ADD REPLY

Login before adding your answer.

Traffic: 1890 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6