I finished my imputation on the michigan server and this was the plot from the QC report. Do I need to redo the imputation or does this just indicate that the alleles are flipped? The R2 looks good though, maybe I am just unsure on reading this plot
I would pull out the variants with panel_freq ~ 1 - upload_freq and min(panel_freq, upload_freq) < 0.4. I would check if these are G/C or A/T variants, or both; and also compute the fraction of total G/C // A/T variants they represent, to see if all (>90%) of such variants are flipped, or some subset.
If all such variants are flipped, I would swap the genotypes and re-impute.
If only 40% or fewer of such variants are flipped, then it will be difficult to find the logic as to which are properly stranded and which are flipped, so I would exclude all such variants, and re-impute.
All of them are G/C and A/T variants
Some seem mismatches based on AF and some seem to need flipping. I guess I am confused, as in the Michigan imputation server pipeline, it says that the pipeline does the flipping where necessary. The added complexity is that my dataset was human build 38 and I am imputing to HRC 37 so I want to make sure I've understood correctly, before doing any potentially more work.
"On Variant level:
Check alleles: Only A,C,G,T are allowed
Calculate alternative allele frequency (AF): Mark all with a AF > 0.5.
Calculate SNP call rate
Calculate chi square for each variant (reference panel vs. study data)
Determine allele switches: Compare ref and alt of reference panel with study data (A/T and C/G variants are ignored).
Determine strand flips: After eliminating possible allele switches, flip and compare ref/alt from reference panel with study data.
Determine allele switches in combination with strand flips: Combine the two rules from above."
Potential Frequency Mismatches
Markers where chisq is greater than 300.
## Total mismatches: 3064
## Mismatched frequencies for '1:979748' f[A,T] = [0.04060606,0.9593939] vs [0.9618,0.0382], chisq 17555.54 [[[looks like it needs flipping]]
## Mismatched frequencies for '1:1267291' f[G,C] = [0.001840491,0.9981595] vs [0.9997,3e-04], chisq 33821.35
## Mismatched frequencies for '1:1269503' f[C,G] = [0.004358655,0.9956413] vs [0.9982,0.0018], chisq 32664.61
## Mismatched frequencies for '1:2273144' f[A,G] = [0.4945455,0.5054545] vs [0.7682,0.2318], chisq 635.6371 [[[looks mismatched]]
## Mismatched frequencies for '1:2842483' f[C,T] = [0.7463145,0.2536855] vs [0.9426,0.0574], chisq 958.5274 [[looks mismatched]]
## Mismatched frequencies for '1:3279268' f[T,A] = [0.05266344,0.9473366] vs [0.928,0.072], chisq 11890.88
## Mismatched frequencies for '1:5923427' f[A,T] = [0.007263923,0.9927361] vs [0.9986,0.0014], chisq 32901.53
## Mismatched frequencies for '1:6531575' f[G,C] = [0.006658596,0.9933414] vs [0.9949,0.0051], chisq 30618.37
## Mismatched frequencies for '1:7887579' f[C,G] = [0.1065375,0.8934625] vs [0.9036,0.0964], chisq 8553.06
## Mismatched frequencies for '1:9009451' f[G,C] = [0.1063184,0.8936816] vs [0.9054,0.0946], chisq 8666.461
## Mismatched frequencies for '1:9307230' f[T,A] = [0.2475787,0.7524213] vs [0.7584,0.2416], chisq 2099.296
## Mismatched frequencies for '1:10296364' f[A,C] = [0.6963636,0.3036364] vs [0.8781,0.1219], chisq 456.4865
## Mismatched frequencies for '1:11968317' f[C,T] = [0.9656442,0.03435583] vs [0.9996,4e-04], chisq 886.3485
## Mismatched frequencies for '1:12638114' f[C,G] = [0.03090909,0.9690909] vs [0.9703,0.0297], chisq 19941.96
## Mismatched frequencies for '1:12865657' f[A,C] = [0.9624698,0.03753027] vs [0.9995,5e-04], chisq 942.3274
## Mismatched frequencies for '1:12867288' f[A,G] = [0.9635922,0.03640777] vs [0.9997,3e-04], chisq 1002.4
## Mismatched frequencies for '1:12869122' f[G,T] = [0.9624698,0.03753027] vs [0.9994,6e-04], chisq 899.982
## Mismatched frequencies for '1:12870842' f[C,T] = [0.9663814,0.03361858] vs [0.9996,4e-04], chisq 863.8973
## Mismatched frequencies for '1:12871828' f[G,A] = [0.9630303,0.03696969] vs [0.9994,6e-04], chisq 882.6173
## Mismatched frequencies for '1:14778427' f[C,T] = [0.5332929,0.466707] vs [0.8888,0.1112], chisq 1775.163
## Mismatched frequencies for '1:16151285' f[A,C] = [0.8733333,0.1266667] vs [0.9706,0.0294], chisq 450.9759
## Mismatched frequencies for '1:17284498' f[T,C] = [0.4479419,0.5520581] vs [0.9611,0.0389], chisq 6936.245
## Mismatched frequencies for '1:17355075' f[A,T] = [0.003631961,0.996368] vs [0.998,0.002], chisq 32635.71
## Mismatched frequencies for '1:18676399' f[C,A] = [0.5345036,0.4654964] vs [0.8014,0.1986], chisq 671.4267
## Mismatched frequencies for '1:19200392' f[T,A] = [0.001210654,0.9987894] vs [0.9999,1e-04], chisq 34008.25
## Mismatched frequencies for '1:19217693' f[T,A] = [0.02421308,0.9757869] vs [0.9846,0.0154], chisq 24970.8
## Mismatched frequencies for '1:20931828' f[A,T] = [0.7690909,0.2309091] vs [0.2134,0.7866], chisq 2656.207
## Mismatched frequencies for '1:22225743' f[G,C] = [0.3248485,0.6751515] vs [0.6636,0.3364], chisq 789.1561
## Mismatched frequencies for '1:22711473' f[G,C] = [0.1652542,0.8347458] vs [0.839,0.161], chisq 4570.477
## Mismatched frequencies for '1:23189565' f[G,C] = [0.001212121,0.9987879] vs [0.9997,3e-04], chisq 33866.12
## Mismatched frequencies for '1:24192103' f[C,G] = [0.005447942,0.9945521] vs [0.994,0.006], chisq 30146.24
## Mismatched frequencies for '1:25758419' f[T,C] = [0.4642857,0.5357143] vs [0.8573,0.1427], chisq 1791.04
## Mismatched frequencies for '1:27105880' f[C,G] = [0.001818182,0.9981818] vs [0.9998,2e-04], chisq 33914.26
## Mismatched frequencies for '1:28685602' f[A,G] = [0.7070218,0.2929782] vs [0.9989,0.0011], chisq 8928.828
## Mismatched frequencies for '1:30592003' f[G,T] = [0.9187042,0.08129584] vs [0.9999,1e-04], chisq 2580.742
## Mismatched frequencies for '1:31448207' f[T,G] = [0.5877724,0.4122276] vs [0.7874,0.2126], chisq 362.407
## Mismatched frequencies for '1:32170247' f[T,G] = [0.8742406,0.1257594] vs [0.9951,0.0049], chisq 2155.537
## Mismatched frequencies for '1:33272073' f[C,G] = [0.001820388,0.9981796] vs [0.9997,3e-04], chisq 33842.09
## Mismatched frequencies for '1:33547864' f[T,A] = [0.001210654,0.9987894] vs [0.9999,1e-04], chisq 34008.25
## Mismatched frequencies for '1:33838192' f[C,A] = [0.9393204,0.06067961] vs [0.9912,0.0088], chisq 377.6425
## Mismatched frequencies for '1:34605902' f[C,A] = [0.1579903,0.8420097] vs [0.4278,0.5722], chisq 471.4645
## Mismatched frequencies for '1:35364689' f[T,C] = [0.7596852,0.2403148] vs [0.9675,0.0325], chisq 1666.04
## Mismatched frequencies for '1:37536344' f[C,T] = [0.3226392,0.6773608] vs [0.7495,0.2505], chisq 1449.327
## Mismatched frequencies for '1:39059991' f[C,T] = [0.4205097,0.5794903] vs [0.6471,0.3529], chisq 347.9038
## Mismatched frequencies for '1:40539817' f[C,G] = [0.001210654,0.9987894] vs [0.9977,0.0023], chisq 32527.77
## Mismatched frequencies for '1:40557070' f[T,A] = [0.001210654,0.9987894] vs [0.9992,8e-04], chisq 33523.5
## Mismatched frequencies for '1:40781177' f[A,C] = [0.7921212,0.2078788] vs [0.9998,2e-04], chisq 6680.089
## Mismatched frequencies for '1:40961133' f[A,T] = [0.006060606,0.9939394] vs [0.997,0.003], chisq 31904.76
## Mismatched frequencies for '1:41019925' f[G,T] = [0.9,0.1] vs [0.9948,0.0052], chisq 1447.437
## Mismatched frequencies for '1:43299192' f[A,T] = [0.01333333,0.9866667] vs [0.9901,0.0099], chisq 27809.02
## Mismatched frequencies for '1:43803600' f[T,A] = [0.001212121,0.9987879] vs [0.9999,1e-04], chisq 34006.13
## Mismatched frequencies for '1:43804305' f[G,C] = [0.001212121,0.9987879] vs [0.9997,3e-04], chisq 33866.12
## Mismatched frequencies for '1:45228236' f[A,T] = [0.002421308,0.9975787] vs [0.9998,2e-04], chisq 33894.78
## Mismatched frequencies for '1:45974478' f[G,C] = [0.001815981,0.998184] vs [0.9997,3e-04], chisq 33846.69
## Mismatched frequencies for '1:47502059' f[C,T] = [0.6684982,0.3315018] vs [0.9832,0.0168], chisq 4998.66
## Mismatched frequencies for '1:47534954' f[C,T] = [0.00182704,0.9981729] vs [0.9989,0.0011], chisq 33283.62
## Mismatched frequencies for '1:47746675' f[C,G] = [0.004237288,0.9957627] vs [0.9972,0.0028], chisq 32100.04
## Mismatched frequencies for '1:47961174' f[T,C] = [0.8135593,0.1864407] vs [0.5537,0.4463], chisq 432.2093
## Mismatched frequencies for '1:49056636' f[T,A] = [0.001213592,0.9987864] vs [0.9998,2e-04], chisq 33933.78
## Mismatched frequencies for '1:49234376' f[G,C] = [0.2910085,0.7089915] vs [0.7041,0.2959], chisq 1237.168
## Mismatched frequencies for '1:52867839' f[C,G] = [0.00969697,0.990303] vs [0.9954,0.0046], chisq 30798.75
## Mismatched frequencies for '1:53722932' f[G,C] = [0.001815981,0.998184] vs [0.9999,1e-04], chisq 33986.55
## Mismatched frequencies for '1:54452246' f[A,G] = [0.9574727,0.04252734] vs [0.9968,0.0032], chisq 477.7615
## Mismatched frequencies for '1:56931473' f[C,A] = [0.7354015,0.2645985] vs [0.9348,0.0652], chisq 898.8943
## Mismatched frequencies for '1:57406565' f[C,G] = [0.001210654,0.9987894] vs [0.9997,3e-04], chisq 33868.4
## Mismatched frequencies for '1:57411642' f[C,G] = [0.001215067,0.998785] vs [0.9999,1e-04], chisq 34001.88
## Mismatched frequencies for '1:59248085' f[G,C] = [0.001829268,0.9981707] vs [0.9997,3e-04], chisq 33832.88
## Mismatched frequencies for '1:62299410' f[C,G] = [0.001216545,0.9987835] vs [0.9996,4e-04], chisq 33789.47
## Mismatched frequencies for '1:63252215' f[C,A] = [0.8286925,0.1713075] vs [0.9625,0.0375], chisq 669.4425
## Mismatched frequencies for '1:63715328' f[A,C] = [0.5623472,0.4376528] vs [0.9997,3e-04], chisq 14305.03
## Mismatched frequencies for '1:64127468' f[C,G] = [0.1890909,0.8109091] vs [0.8051,0.1949], chisq 3420.37
ADD REPLY
• link
updated 7 months ago by
GenoMax
147k
•
written 7 months ago by
kl
▴
10
I agree that some need to be flipped (A/T and C/G where your panel frequency is 1 - ref panel frequency). There are other variants which you have labeld as "looks mismatched" - these may be errors in lift-over, or errors in genotype calling.
I would go ahead and flip the A/T // C/G sites, and filter out the others, unless there is a strong expectation to have a small number of sites show an inflation or deflation of allele frequency...
Ok yes I think my thought was that given these errors, I could remove the mismatches and flip the alleles which seem to need flipping based on allele frequencies. Would I need to redo the imputation if I flipped these alleles and removed the mismatches? I'm not quite sure or is it fine to make these changes post-imputation? Thanks for advising - it's reassuring!
This does not warrant redoing the imputation, though? Do you have any advice of what to take care of, in this circumstance? Thanks!
I would pull out the variants with panel_freq ~ 1 - upload_freq and min(panel_freq, upload_freq) < 0.4. I would check if these are G/C or A/T variants, or both; and also compute the fraction of total G/C // A/T variants they represent, to see if all (>90%) of such variants are flipped, or some subset.
If all such variants are flipped, I would swap the genotypes and re-impute.
If only 40% or fewer of such variants are flipped, then it will be difficult to find the logic as to which are properly stranded and which are flipped, so I would exclude all such variants, and re-impute.
All of them are G/C and A/T variants Some seem mismatches based on AF and some seem to need flipping. I guess I am confused, as in the Michigan imputation server pipeline, it says that the pipeline does the flipping where necessary. The added complexity is that my dataset was human build 38 and I am imputing to HRC 37 so I want to make sure I've understood correctly, before doing any potentially more work.