Question

Inflation in Imputed data

2

Entering edit mode

3.1 years ago

desicasares ▴ 40

Hi all, I am working with genotyped data with immunochip v1 and v2.

I have several cohorts, each with one dataset for cases and one for controls. All of them have passed the quality controls (maf, mind, geno, hwe, missing test, deletion of non autosomal chromosomes and indels, as well as AT/GC type SNPs with frequencies close to 0.5). After merging the case and control datasets from each cohort independently, I make the PCAs and they appear to form a homogeneous cloud and the genomic inflation factor is around 0.95.

The problem I have is when I impute these data (using TOPMED imputation server and its multiethnic reference panel), because when I impute them, and do the logistic regression (after filtering again by maf and hwe) I get very inflated results and many false positives in all chromosomes.

What can this be due to? Thanks in advance!

genomic immunochip inflation topmed • 1.2k views

ADD COMMENT • link updated 15 months ago by Amy ▴ 20 • written 3.1 years ago by desicasares ▴ 40

0

Entering edit mode

Probably not the cause - but did you make sure to filter on INFO imputation score as well?

It’s hard to answer without more details - specifically, what ethnicity is your dataset and how many PCs are you using in your model to control for population stratification? Inflation in test statistics is usually caused by population structure that hasn’t been controlled for - you need to be extra careful if your cohort is multi-ethnic or your sample size is very large.

ADD REPLY • link 3.1 years ago by 4galaxy77 2.9k

0

Entering edit mode

Thank you for your answer...the populations I am using are Spanish, North European and Italian individuals.

In the case of the Italians the analysis of the imputed data went well, both filtering by rsq of 0.9 and 0.3 (in addition to maf and hwe). I have tried running the regression on the other two sets (the problematic ones) with both 0.3 and 0.9 rsq filtered data, but the results are the same in terms of inflation. I have also tried adjusting for both 5 and 10 PCs.

Regarding the sample sizes, one cohort contains 54 cases vs 700 controls, while the other contains 82 cases vs 1500 controls. However in the case of the Italian one (which is the one that works) I also have a difference of 90 cases vs 1200 controls, so I doubt that this is interfering.

One of the things I can imagine to be the cause is the merge between cases and controls. However when I do a logistic regression of the genotyped data, using that same merge, the results show no inflation, and I have checked that both datasets were genotyped using the same array, Immunochip v1 in the case of North Europeans for example.

That's why I can't find the key, since a priori, the genotyped data seem to be under control, until I analyze the imputed ones.

ADD REPLY • link 3.1 years ago by desicasares ▴ 40

0

Entering edit mode

desicasares Just curious if you found a solution to this problem as I'm experiencing the same issue in my analysis

ADD REPLY • link 15 months ago by Amy ▴ 20