Question

Imputation on two genotyping datasets: should I do imputation separately? or merge the two datasets first?

1

Entering edit mode

8.0 years ago

Tao ▴ 540

Hi guys,

I'm doing eQTL analysis. The genotyping data are from two sequencing centers using same type of SNPs chip. But one center genotyping has a better SNPs call rate than the other one: ~100,000 more SNPs were called. I did QC on two datasets separately. QC would also cause some SNPs variance between the two datasets, while means some SNPs will be removed in one data set but won't in the other.

Now I am stuck on the imputation step. Should I do imputation separately and combine the two imputed genotyping data sets for later eQTL? or first combine the two QCed genotyping data sets and do imputation together? I don't know much about the principles of genotyping imputation, so hope someone can help me on this. Thanks!

Tao

imputation genotyping eQTL SNPs • 3.7k views

ADD COMMENT • link 8.0 years ago by Tao ▴ 540

0

Entering edit mode

For this question, in case someone would have similar situation, I'd like to answer by myself. In GTEx (v6p) protocol, they use two different genotyping array: OMNI 5M for pilot phase and OMNI 2.5M for Mid-phase. They first downsized the 5M to 2.5 M portion of variants, and then did QC and imputation. But I think the other way is also feasible when you find there is only a small portion of common variants, maybe because different array platform or manufacturer. That's what I adopted. I did QC for each genotype batches and then merged them after imputation.

ADD REPLY • link 7.1 years ago by Tao ▴ 540