logistic regression using HLA alllelic data
1
4
Entering edit mode
5.1 years ago
monalc40 ▴ 30

I have a case-control dataset and I want to perform logistic regression and conditional logistic regression based on HLA multi-allelic data, using r. I want to find the effect on specific alleles on a trait. How do I do this what is the format. Most examples are based on SNP biallelic data. For instance at HLA-A I may have up to 30 unique alleles, at HLA-B it could be 50. Should I recode all the alleles and perform logistic regression on genotype pairs?

R • 1.8k views
ADD COMMENT
0
Entering edit mode

If you are merely asking this as a technical question, then you can do this in R via glm(). Your SNP predictors can be encoded categorically for as AA, AB, BB, or continuously as minor allele counts.

Kevin

ADD REPLY
4
Entering edit mode
5.1 years ago
Lemire ▴ 940

Find a way to produce a data frame containing the counts of each alleles that you see, and the case-controls status. E.g. (fake data)

> df

 DX DRB1.0401 DRB1.0404 DRB1.0405 DRB1.0408
1  0         0         0         1         1
2  0         0         0         0         2
3  0         0         0         0         2
4  1         1         0         0         1
5  1         1         0         0         1
6  1         0         1         0         1

If you are interested on the effect of a specific allele, then you can do, e.g.

summary(glm( DX ~ DRB1.0401 , family="binomial", data=df ) )

If you are interested in the effect of your HLA locus as a whole, then you can do, e.g.

full<- glm( DX ~ DRB1.0401 + DRB1.0404 + DRB1.0405 + DRB1.0408, 
   family="binomial", data=df ) 
null<- glm( DX ~ 1 , family="binomial", data=df ) 

anova( null, full , test="Chisq")

adding covariates to the models if deemed necessary.

ADD COMMENT
0
Entering edit mode

The problem has been solved, thanks

ADD REPLY
0
Entering edit mode

To expand on this, how would it look if you did have covariate (sex). For example say I have a multialleleic locus with three possible snps:

 data <- data.frame("snp1"=c(runif(n=150, min=0,max=2),
                          c(runif(n=50, min=0,max=2))),
                  "snp2"=c(runif(n=50, min=0, max=.2),
                           runif(n=50, min=0, max=.2),
                           runif(n=50, min=1.5, max=2),
                           runif(n=50, min=1.5, max=2)),
                  "snp3"=c(runif(n=50, min=0, max=.2),
                          runif(n=50, min=0, max=.2),
                          runif(n=50, min=1.5, max=2),
                          runif(n=50, min=1.5, max=2)),
                  "sex"=runif(n=50, min=0, max=1),
                   "disease"=c(rbinom(150, 1, 0.1),
                               rbinom(50, 1, 0.9)))

to test locus at whole I would do this:

multi_snp_full <- glm(disease ~ snp2 + snp3 + sex, data=data, family="binomial")
null <- glm(disease ~ sex, data=data, family="binomial") 
anova( null, multi_snp_full , test="Chisq")

If I wanted to go back and test snp2 specifically, would it just be this (with no LR test)?

single_snp_test <- glm(disease ~ snp2 + sex, data=data, family="binomial")
ADD REPLY

Login before adding your answer.

Traffic: 1777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6