Question

GWAS data analysis strategy or pipeline

2

Entering edit mode

6.0 years ago

Shicheng Guo ★ 9.5k

Suppose I received 5000 case and 5000 control GWAS study (suppose it is exom-array), what kinds of analysis I can conducted to make full use of the genetic data? According to my current knowledge, It looks I need to do it like the following way and I hope to get some suggestion to make the analysis perfect:

transfer exom-array plink format to VCF format
transfer all the probes to Forward chain.
PCA to remove population outlier
send it to Michigan Imputation Server to do imputation and phasing
do the statistic analysis with allele-base, genotype based- with different model: dominant, recessive and so on
do compound hetero-zygote scanning, do epistasis test, do interaction test...
do gene-based, pathway based analysis
do genetic risk score associated analysis
do biological validation

Any more suggestions??

weighted burden tests

GWAS SNP-array Exom-array • 2.6k views

ADD COMMENT • link 6.0 years ago by Shicheng Guo ★ 9.5k

1

Entering edit mode

Most of it depends on the aim of your project. What are you trying to achieve from your GWAS study ? Is there an aim to this project ?

ADD REPLY • link 6.0 years ago by NB ▴ 960

0

Entering edit mode

No specific aim, just data mining. Get what we can get from this data and make full use of it.

ADD REPLY • link 6.0 years ago by Shicheng Guo ★ 9.5k

score 3 · Answer 1 · 2018-11-12

3

Entering edit mode

6.0 years ago

Vivek ★ 2.7k

The phase and impute strategy would work if you have a genome wide array of markers. There won't be enough genome wide SNP coverage to impute accurately if all you have is an exome array. The other option is to go with weighted burden tests, finding something from them is a question of how much power you have from 5000 cases and 5000 controls - what you have left over after sample QC for admixture, kinship checks etc. and if your phenotype is binary or quantitative.

I'd suggest starting with a power analysis before spending time on crafting an analysis plan.

ADD COMMENT • link 6.0 years ago by Vivek ★ 2.7k

0

Entering edit mode

What's the aim to check admixture and kinship? remove them or what?
phenotype will have binary and quantitative
Power analysis is very good suggest!!
You are right, exom-array don't have very good imputation. How much R2 is required? R2>0.9?

ADD REPLY • link 6.0 years ago by Shicheng Guo ★ 9.5k

1

Entering edit mode

When you use a linear model to check for association (Y = XB + E) the core assumption is that the elements of Y are independent, that's why you check for kinship and admixture and remove any that violate those assumptions.

ADD REPLY • link 6.0 years ago by Vivek ★ 2.7k

0

Entering edit mode

Interesting, so what's the best threshold for P_hat to be applied to remove the samples.

ADD REPLY • link 6.0 years ago by Shicheng Guo ★ 9.5k

1

Entering edit mode

I'm not sure what you mean by p_hat, the admixture QC is done with some PCA analysis and a reference population like 1000 genomes. There must be a tutorial on biostars if you search for it. You remove any samples related more than third degree using KING for kinship analysis.

ADD REPLY • link 6.0 years ago by Vivek ★ 2.7k