Question

Low SNP imputation concordance - except chromosome 1

0

Entering edit mode

6.3 years ago

muraved ▴ 10

Hi,

I'm trying to do SNP imputation using IMPUTE2. My genotype data (bim/bed/fam) is hg18, with the usual quality control, mostly European ancestry, reference panel is hg19.

My workflow:

Determine flipped and ambiguous SNPs relative to hg18, using snpflip (https://github.com/biocore-ntnu/snpflip ).
Use PLINK to create map/ped files, with flipped SNPs flipped and ambiguous removed.
Computed liftover to hg19 using liftOverPlink.py (https://github.com/sritchie73/liftOverPlink ).

Using gtools to transform ped/map to gen/sample. Final SNP counts per chromosome in the .gen file:

Using IMPUTE2 with hg19 reference panel (https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html ), chunk size 5000000, -Ne 20000 and -filt_rules_l 'EUR==0'.

I'm parsing the top-right entry in the concordance table (see https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#concordance_tables ), using

cat $summaryfile | grep -A1 "Concordance" | grep -v "Concordance" | tr -s ' ' | cut -d ' ' -f 8,9 | grep -v -- -- | tr -d '\n'; echo

When plotting these values by chromosome, it turns out that chr1 has good concordance as expected (~95%), whereas all others are pretty bad:

The same happens when not filtering for EUR.

I'm at a loss here, any idea what could be causing this?

SNP IMPUTE2 PLINK imputation GWAS • 1.8k views

ADD COMMENT • link 6.3 years ago by muraved ▴ 10