Question

How to do a Haplotype based GWAS?

0

Entering edit mode

6.1 years ago

anikcropscience ▴ 270

Hello, this is my first post in this renowned group. Acutally, I am struggling with Haplotype based GWAS. I am doing GWAS on a plant pathogen. The single SNP GWAS has yielded any significant result. So, I have decided to use Haplotype based GWAS. I have a VCF file containing 717045 SNPs. I used Plink to generate haplotype blocks. It has yielded more than 74000 haplotype blocks. But I do not know how to use that data in the GWAS. From the plink output, the data is in .blocks or .blocks.det format.

Can someone please give me some ideas about how to use that data for GWAS? or How can I perform a Haplotype Based GWAS? Using which tool?

Looking forward to your reply. Thank you very much.

Regards Anik Dutta

genome snp plink • 3.9k views

ADD COMMENT • link 6.1 years ago by anikcropscience ▴ 270

0

Entering edit mode

Thank you very much Kevin. But the problem is, Ihave quite a few SNPs that are multiallelic and I do not want to loose them. So If I import those in Plink they are lost because plink only accepts biallelic SNPs. Do you have any suggestions on how can I keep those SNPs that are multiallelic?

Anik

ADD REPLY • link 6.1 years ago by anikcropscience ▴ 270

0

Entering edit mode

What if you split the multi-allelic records into individual records? This can be done with bcftools norm -m-any

ADD REPLY • link 6.1 years ago by Kevin Blighe 89k

0

Entering edit mode

This can be done with Vcftools or bcftools? Is bcftools included in Plink? Sorry, if the question is naive. I am new in this field.

ADD REPLY • link 6.1 years ago by anikcropscience ▴ 270

0

Entering edit mode

No, you would have to split these multi-allelic calls outside of PLINK, and then input the VCF file(s) back into PLINK.

ADD REPLY • link 6.1 years ago by Kevin Blighe 89k

1

Entering edit mode

Ok thanks a lot for the information.

ADD REPLY • link 6.1 years ago by anikcropscience ▴ 270

score 0 · Answer 1 · 2019-05-08

0

Entering edit mode

6.1 years ago

Kevin Blighe 89k

Hey, you are very welcome here.

If you use the older versions of PLINK (pre v1.9), you can simply use the --hap-assoc command line parameter. Take a look here: Multimarker haplotype tests

The feature does not seem to be implemented in version >=1.9. See the bottom of THIS page:

The .blocks file is valid input for PLINK 1.07's --hap command. However, the --hap... family of flags has not been reimplemented in PLINK 1.9 due to poor phasing accuracy (and, consequently, inferior haplotype likelihood/frequency estimates) relative to other software; for now, we recommend using BEAGLE 3.3.2 instead of PLINK for case/control haplotype association analysis. (You can use "--recode beagle" to export data.) We apologize for the inconvenience, and plan to develop variants of the --hap... flags which handle pre-phased data effectively.

You also have the option of exporting your data and creating your own statistical test, e.g., in R.

Kevin

ADD COMMENT • link 6.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Hi Kevin,

I am in a similar position. I am interested in testing the association between some predetermined haplotypes and phenotype. I have dosage data from an imputation, looks something like this:

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=1,Type=Float,Description="Estimated Alternate Allele Dosage : [P(0/1)+2*P(1/1)]">
##FORMAT=<ID=HDS,Number=2,Type=Float,Description="Estimated Haploid Alternate Allele Dosage">
##FORMAT=<ID=GP,Number=3,Type=Float,Description="Estimated Posterior Probabilities for Genotypes 0/0, 0/1 and 1/1">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
chrX    2781514 chrX:2781514:C:A        C       A       .       PASS    IMPUTED;AF=0.38916;MAF=0.38916;R2=0.34152;AN=118530;AC=31088    GT:DS:HDS:GP    0|0:0.429:0.214,0.214:0.617,0.337,0.046 
chrX    2781604 chrX:2781604:G:T        G       T       .       PASS    IMPUTED;AF=0.11608;MAF=0.11608;R2=0.84717;AN=118530;AC=15266    GT:DS:HDS:GP    0|0:0:0,0:1,0,0 
chrX    2781642 chrX:2781642:G:A        G       A       .       PASS    IMPUTED;AF=0.00109333;MAF=0.00109333;R2=0.37754;AN=118530;AC=58 GT:DS:HDS:GP    0|0:0.002:0.001,0.001:0.998,0.002,0    
chrX    2782104 chrX:2782104:C:G        C       G       .       PASS    IMPUTED;AF=0.000133333;MAF=0.000133333;R2=0.48194;AN=118530;AC=13       GT:DS:HDS:GP    0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0

As an example, is there anyway to use the dosage data to test association between a phenotype and a haplotype that is defined by having:

A at chrX:2781514 and T at chrX:2781604

?

That almost sounds exactly what plink developers were planning to implement with making --haps take pre-phased data, but never got there. I guess I could just hardcall the haplotypes, but I am wondering if there is a way to do association that includes dosages to help include uncertainty around imputation and phasing in the test.

If you have time to read, thank you.

ADD REPLY • link 4.7 years ago by curious ▴ 890