Hi,
I'm wondering if it's possible to impute the presence of the HLA-B27 antigen from a 23andMe genome? This is a genome of European origin.
I've tried using the snp2hla program (link below), but the results doesn't seem pretty good. And, I'm a total novice at this stuff anyway, so it's hard to me to see what's going on.
Here's what I did:
Converted 23andMe genome to plink format:
plink2 --23file genome.txt familyid nameid M --out foo
Then I ran snp2hla:
./SNP2HLA.csh foo HM_CEU_REF foo2hla `which plink2` 2000 1000
This uses beagle and some awk scripts to produce a lot of files, including a Beagle gprobs file, a dosage file and a bgl.phased, among others. I haven't looked closely at the phased data, but I guess that's exactly what it is (I happen to know the correct phasing of the data, but I haven't spent time investigating).
The snp2hla program was originally made to be used with plink 1. Do you know if the file format has changed between plink 1 and 1.9/2?
Looking at the dosage file, it seems I get 0.000% presense hit on all HLA antigens, which I find very weird. But I'm definitely seeing some imputed SNPs that aren't part of the genome.
The snp2hla used to contain a large reference panel (T1DGC), but they've removed that from the net because of security (or privacy, I guess).
So my questions are: Is it at all possible to detect the presence of HLA-B27 from a 23andMe genome using a reference panel (I guess it should), and do you have any idea if I'm doing something wrong?
Any hints will be appreciated!
Link to snp2hla: https://www.broadinstitute.org/mpg/snp2hla/
HIBAG is an HLA genotype imputation tool, HIBAG can be used by researchers with published parameter estimates (http://www.biostat.washington.edu/~bsweir/HIBAG/ ) instead of requiring access to large training sample datasets.