Post-imputation plot
2
0
Entering edit mode
6 months ago
kl ▴ 10

Hi,

I finished my imputation on the michigan server and this was the plot from the QC report. Do I need to redo the imputation or does this just indicate that the alleles are flipped? The R2 looks good though, maybe I am just unsure on reading this plot

Thanks for any advice!

enter image description here

michigan-imputation-server quality-control • 1.0k views
ADD COMMENT
0
Entering edit mode
6 months ago
LChart 4.5k

The anti-diagonal band does suggest some fraction of genotypes are strand-flipped - probably G/C or A/T variants.

ADD COMMENT
0
Entering edit mode

This does not warrant redoing the imputation, though? Do you have any advice of what to take care of, in this circumstance? Thanks!

ADD REPLY
0
Entering edit mode

I would pull out the variants with panel_freq ~ 1 - upload_freq and min(panel_freq, upload_freq) < 0.4. I would check if these are G/C or A/T variants, or both; and also compute the fraction of total G/C // A/T variants they represent, to see if all (>90%) of such variants are flipped, or some subset.

If all such variants are flipped, I would swap the genotypes and re-impute.

If only 40% or fewer of such variants are flipped, then it will be difficult to find the logic as to which are properly stranded and which are flipped, so I would exclude all such variants, and re-impute.

ADD REPLY
0
Entering edit mode

All of them are G/C and A/T variants Some seem mismatches based on AF and some seem to need flipping. I guess I am confused, as in the Michigan imputation server pipeline, it says that the pipeline does the flipping where necessary. The added complexity is that my dataset was human build 38 and I am imputing to HRC 37 so I want to make sure I've understood correctly, before doing any potentially more work.

"On Variant level:

Check alleles: Only A,C,G,T are allowed
Calculate alternative allele frequency (AF): Mark all with a AF > 0.5.
Calculate SNP call rate
Calculate chi square for each variant (reference panel vs. study data)
Determine allele switches: Compare ref and alt of reference panel with study data (A/T and C/G variants are ignored).
Determine strand flips: After eliminating possible allele switches, flip and compare ref/alt from reference panel with study data.
Determine allele switches in combination with strand flips: Combine the two rules from above."

Potential Frequency Mismatches
Markers where chisq is greater than 300.

## Total mismatches: 3064

## Mismatched frequencies for '1:979748' f[A,T] = [0.04060606,0.9593939] vs [0.9618,0.0382], chisq 17555.54     [[[looks like it needs flipping]]
## Mismatched frequencies for '1:1267291' f[G,C] = [0.001840491,0.9981595] vs [0.9997,3e-04], chisq 33821.35
## Mismatched frequencies for '1:1269503' f[C,G] = [0.004358655,0.9956413] vs [0.9982,0.0018], chisq 32664.61
## Mismatched frequencies for '1:2273144' f[A,G] = [0.4945455,0.5054545] vs [0.7682,0.2318], chisq 635.6371  [[[looks mismatched]]
## Mismatched frequencies for '1:2842483' f[C,T] = [0.7463145,0.2536855] vs [0.9426,0.0574], chisq 958.5274   [[looks mismatched]]
## Mismatched frequencies for '1:3279268' f[T,A] = [0.05266344,0.9473366] vs [0.928,0.072], chisq 11890.88
## Mismatched frequencies for '1:5923427' f[A,T] = [0.007263923,0.9927361] vs [0.9986,0.0014], chisq 32901.53
## Mismatched frequencies for '1:6531575' f[G,C] = [0.006658596,0.9933414] vs [0.9949,0.0051], chisq 30618.37
## Mismatched frequencies for '1:7887579' f[C,G] = [0.1065375,0.8934625] vs [0.9036,0.0964], chisq 8553.06
## Mismatched frequencies for '1:9009451' f[G,C] = [0.1063184,0.8936816] vs [0.9054,0.0946], chisq 8666.461
## Mismatched frequencies for '1:9307230' f[T,A] = [0.2475787,0.7524213] vs [0.7584,0.2416], chisq 2099.296
## Mismatched frequencies for '1:10296364' f[A,C] = [0.6963636,0.3036364] vs [0.8781,0.1219], chisq 456.4865
## Mismatched frequencies for '1:11968317' f[C,T] = [0.9656442,0.03435583] vs [0.9996,4e-04], chisq 886.3485
## Mismatched frequencies for '1:12638114' f[C,G] = [0.03090909,0.9690909] vs [0.9703,0.0297], chisq 19941.96
## Mismatched frequencies for '1:12865657' f[A,C] = [0.9624698,0.03753027] vs [0.9995,5e-04], chisq 942.3274
## Mismatched frequencies for '1:12867288' f[A,G] = [0.9635922,0.03640777] vs [0.9997,3e-04], chisq 1002.4
## Mismatched frequencies for '1:12869122' f[G,T] = [0.9624698,0.03753027] vs [0.9994,6e-04], chisq 899.982
## Mismatched frequencies for '1:12870842' f[C,T] = [0.9663814,0.03361858] vs [0.9996,4e-04], chisq 863.8973
## Mismatched frequencies for '1:12871828' f[G,A] = [0.9630303,0.03696969] vs [0.9994,6e-04], chisq 882.6173
## Mismatched frequencies for '1:14778427' f[C,T] = [0.5332929,0.466707] vs [0.8888,0.1112], chisq 1775.163
## Mismatched frequencies for '1:16151285' f[A,C] = [0.8733333,0.1266667] vs [0.9706,0.0294], chisq 450.9759
## Mismatched frequencies for '1:17284498' f[T,C] = [0.4479419,0.5520581] vs [0.9611,0.0389], chisq 6936.245
## Mismatched frequencies for '1:17355075' f[A,T] = [0.003631961,0.996368] vs [0.998,0.002], chisq 32635.71
## Mismatched frequencies for '1:18676399' f[C,A] = [0.5345036,0.4654964] vs [0.8014,0.1986], chisq 671.4267
## Mismatched frequencies for '1:19200392' f[T,A] = [0.001210654,0.9987894] vs [0.9999,1e-04], chisq 34008.25
## Mismatched frequencies for '1:19217693' f[T,A] = [0.02421308,0.9757869] vs [0.9846,0.0154], chisq 24970.8
## Mismatched frequencies for '1:20931828' f[A,T] = [0.7690909,0.2309091] vs [0.2134,0.7866], chisq 2656.207
## Mismatched frequencies for '1:22225743' f[G,C] = [0.3248485,0.6751515] vs [0.6636,0.3364], chisq 789.1561
## Mismatched frequencies for '1:22711473' f[G,C] = [0.1652542,0.8347458] vs [0.839,0.161], chisq 4570.477
## Mismatched frequencies for '1:23189565' f[G,C] = [0.001212121,0.9987879] vs [0.9997,3e-04], chisq 33866.12
## Mismatched frequencies for '1:24192103' f[C,G] = [0.005447942,0.9945521] vs [0.994,0.006], chisq 30146.24
## Mismatched frequencies for '1:25758419' f[T,C] = [0.4642857,0.5357143] vs [0.8573,0.1427], chisq 1791.04
## Mismatched frequencies for '1:27105880' f[C,G] = [0.001818182,0.9981818] vs [0.9998,2e-04], chisq 33914.26
## Mismatched frequencies for '1:28685602' f[A,G] = [0.7070218,0.2929782] vs [0.9989,0.0011], chisq 8928.828
## Mismatched frequencies for '1:30592003' f[G,T] = [0.9187042,0.08129584] vs [0.9999,1e-04], chisq 2580.742
## Mismatched frequencies for '1:31448207' f[T,G] = [0.5877724,0.4122276] vs [0.7874,0.2126], chisq 362.407
## Mismatched frequencies for '1:32170247' f[T,G] = [0.8742406,0.1257594] vs [0.9951,0.0049], chisq 2155.537
## Mismatched frequencies for '1:33272073' f[C,G] = [0.001820388,0.9981796] vs [0.9997,3e-04], chisq 33842.09
## Mismatched frequencies for '1:33547864' f[T,A] = [0.001210654,0.9987894] vs [0.9999,1e-04], chisq 34008.25
## Mismatched frequencies for '1:33838192' f[C,A] = [0.9393204,0.06067961] vs [0.9912,0.0088], chisq 377.6425
## Mismatched frequencies for '1:34605902' f[C,A] = [0.1579903,0.8420097] vs [0.4278,0.5722], chisq 471.4645
## Mismatched frequencies for '1:35364689' f[T,C] = [0.7596852,0.2403148] vs [0.9675,0.0325], chisq 1666.04
## Mismatched frequencies for '1:37536344' f[C,T] = [0.3226392,0.6773608] vs [0.7495,0.2505], chisq 1449.327
## Mismatched frequencies for '1:39059991' f[C,T] = [0.4205097,0.5794903] vs [0.6471,0.3529], chisq 347.9038
## Mismatched frequencies for '1:40539817' f[C,G] = [0.001210654,0.9987894] vs [0.9977,0.0023], chisq 32527.77
## Mismatched frequencies for '1:40557070' f[T,A] = [0.001210654,0.9987894] vs [0.9992,8e-04], chisq 33523.5
## Mismatched frequencies for '1:40781177' f[A,C] = [0.7921212,0.2078788] vs [0.9998,2e-04], chisq 6680.089
## Mismatched frequencies for '1:40961133' f[A,T] = [0.006060606,0.9939394] vs [0.997,0.003], chisq 31904.76
## Mismatched frequencies for '1:41019925' f[G,T] = [0.9,0.1] vs [0.9948,0.0052], chisq 1447.437
## Mismatched frequencies for '1:43299192' f[A,T] = [0.01333333,0.9866667] vs [0.9901,0.0099], chisq 27809.02
## Mismatched frequencies for '1:43803600' f[T,A] = [0.001212121,0.9987879] vs [0.9999,1e-04], chisq 34006.13
## Mismatched frequencies for '1:43804305' f[G,C] = [0.001212121,0.9987879] vs [0.9997,3e-04], chisq 33866.12
## Mismatched frequencies for '1:45228236' f[A,T] = [0.002421308,0.9975787] vs [0.9998,2e-04], chisq 33894.78
## Mismatched frequencies for '1:45974478' f[G,C] = [0.001815981,0.998184] vs [0.9997,3e-04], chisq 33846.69
## Mismatched frequencies for '1:47502059' f[C,T] = [0.6684982,0.3315018] vs [0.9832,0.0168], chisq 4998.66
## Mismatched frequencies for '1:47534954' f[C,T] = [0.00182704,0.9981729] vs [0.9989,0.0011], chisq 33283.62
## Mismatched frequencies for '1:47746675' f[C,G] = [0.004237288,0.9957627] vs [0.9972,0.0028], chisq 32100.04
## Mismatched frequencies for '1:47961174' f[T,C] = [0.8135593,0.1864407] vs [0.5537,0.4463], chisq 432.2093
## Mismatched frequencies for '1:49056636' f[T,A] = [0.001213592,0.9987864] vs [0.9998,2e-04], chisq 33933.78
## Mismatched frequencies for '1:49234376' f[G,C] = [0.2910085,0.7089915] vs [0.7041,0.2959], chisq 1237.168
## Mismatched frequencies for '1:52867839' f[C,G] = [0.00969697,0.990303] vs [0.9954,0.0046], chisq 30798.75
## Mismatched frequencies for '1:53722932' f[G,C] = [0.001815981,0.998184] vs [0.9999,1e-04], chisq 33986.55
## Mismatched frequencies for '1:54452246' f[A,G] = [0.9574727,0.04252734] vs [0.9968,0.0032], chisq 477.7615
## Mismatched frequencies for '1:56931473' f[C,A] = [0.7354015,0.2645985] vs [0.9348,0.0652], chisq 898.8943
## Mismatched frequencies for '1:57406565' f[C,G] = [0.001210654,0.9987894] vs [0.9997,3e-04], chisq 33868.4
## Mismatched frequencies for '1:57411642' f[C,G] = [0.001215067,0.998785] vs [0.9999,1e-04], chisq 34001.88
## Mismatched frequencies for '1:59248085' f[G,C] = [0.001829268,0.9981707] vs [0.9997,3e-04], chisq 33832.88
## Mismatched frequencies for '1:62299410' f[C,G] = [0.001216545,0.9987835] vs [0.9996,4e-04], chisq 33789.47
## Mismatched frequencies for '1:63252215' f[C,A] = [0.8286925,0.1713075] vs [0.9625,0.0375], chisq 669.4425
## Mismatched frequencies for '1:63715328' f[A,C] = [0.5623472,0.4376528] vs [0.9997,3e-04], chisq 14305.03
## Mismatched frequencies for '1:64127468' f[C,G] = [0.1890909,0.8109091] vs [0.8051,0.1949], chisq 3420.37
ADD REPLY
0
Entering edit mode
6 months ago
LChart 4.5k

I agree that some need to be flipped (A/T and C/G where your panel frequency is 1 - ref panel frequency). There are other variants which you have labeld as "looks mismatched" - these may be errors in lift-over, or errors in genotype calling.

I would go ahead and flip the A/T // C/G sites, and filter out the others, unless there is a strong expectation to have a small number of sites show an inflation or deflation of allele frequency...

ADD COMMENT
0
Entering edit mode

Ok yes I think my thought was that given these errors, I could remove the mismatches and flip the alleles which seem to need flipping based on allele frequencies. Would I need to redo the imputation if I flipped these alleles and removed the mismatches? I'm not quite sure or is it fine to make these changes post-imputation? Thanks for advising - it's reassuring!

ADD REPLY
0
Entering edit mode

Imputation takes what? A day or two of compute time? Just re-do it.

ADD REPLY

Login before adding your answer.

Traffic: 1629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6