Imputation on two genotyping datasets: should I do imputation separately? or merge the two datasets first?
0
1
Entering edit mode
8.0 years ago
Tao ▴ 540

Hi guys,

I'm doing eQTL analysis. The genotyping data are from two sequencing centers using same type of SNPs chip. But one center genotyping has a better SNPs call rate than the other one: ~100,000 more SNPs were called. I did QC on two datasets separately. QC would also cause some SNPs variance between the two datasets, while means some SNPs will be removed in one data set but won't in the other.

Now I am stuck on the imputation step. Should I do imputation separately and combine the two imputed genotyping data sets for later eQTL? or first combine the two QCed genotyping data sets and do imputation together? I don't know much about the principles of genotyping imputation, so hope someone can help me on this. Thanks!

Tao

imputation genotyping eQTL SNPs • 3.7k views
ADD COMMENT
0
Entering edit mode

For this question, in case someone would have similar situation, I'd like to answer by myself. In GTEx (v6p) protocol, they use two different genotyping array: OMNI 5M for pilot phase and OMNI 2.5M for Mid-phase. They first downsized the 5M to 2.5 M portion of variants, and then did QC and imputation. But I think the other way is also feasible when you find there is only a small portion of common variants, maybe because different array platform or manufacturer. That's what I adopted. I did QC for each genotype batches and then merged them after imputation.

ADD REPLY

Login before adding your answer.

Traffic: 2026 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6