We got some SNP chip data from PLINK (bed, bim, fam files) of two different genotypes. We also have the proteomics expression matrices for these data sets.
I was wondering if there is a tool or a workflow we can follow, which will help us analyzing the data.
I am new to GWAS and pQTL and would like to understand how it should be analyzed. i have found multiple papers about this topic, but none of them are sharing their code.
Until now I have done the pre-imputation preparations and the imputation using the sanger Inputation server. I have also done the post-imputation QC. But I am not sure how to proceed.
The next steps I would help with are as follow:
- LD pruning to find independent SNPs
- determine the additive effects of each SNP on the protein expression (linear regression model)
- principal component analysis of specific phenotypes (e.g. gender, age, etc.). 4 ...
I'm not sure really sure how to continue though. I know it can be done in PLINK, but can anyone help me understanding it?
thanks
Assa
P.S.
Did I miss something?