Hello All.
I have 88 samples in 23andme format. The population is EUR. And I would like to do imputation with Michigan Imputation Serve and Haplotype Reference Consortium panels.
The first thing I did is converting the 23andme format to VCF format by bcftools with following commend. The reference is hg18.
bcftools convert -c ID,CHROM,POS,AA -s sampleID -f hg18.fa --tsv2vcf sample.txt -Oz -o sampleID.vcf.gz
Then, I merge all VCFs of 88 samples into one vcf file and upload it to Michigan Imputation Serve. I choose HRC panel for imputation.
But I got flowing errors.
Input Validation
1 valid VCF file(s) found.
Samples: 88
Chromosomes: 1
SNPs: 164386
Chunks: 24
Datatype: unphased
Reference Panel: hrc
Quality Control
Execution successful
Statistics:
Alternative allele frequency > 0.5 sites: 60,835
Reference Overlap: 1.42%
Match: 214
Allele switch: 191
Strand flip: 191
Strand flip and allele switch: 215
A/T, C/G genotypes: 4
Filtered sites:
Filter flag set: 0
Invalid alleles: 62,461
Duplicated sites: 0
NonSNP sites: 0
Monomorphic sites: 0
Allele mismatch: 630
SNPs call rate < 90%: 172
Excluded sites in total: 63,669
Remaining sites in total: 100,717
Warning: 24 Chunks excluded: reference overlap < 50% (see statistics.txt for details).
Remaining chunk(s): 0
Error: No chunks passed the QC step. Imputation cannot be started!
And I also got some statistics information like this
......
Invalid Alleles: 1 (C/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (C/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (C/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (T/.)
Invalid Alleles: 1 (A/.)
Invalid Alleles: 1 (G/C,T)
Invalid Alleles: 1 (A/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (C/.)
Invalid Alleles: 1 (C/.)
Invalid Alleles: 1 (C/.)
......
INFO - Allele switch: rs4970362 - pos: 1084601 (ref: G/A, data: A/G)
INFO - Allele switch: rs6697886 - pos: 1163474 (ref: A/G, data: G/A)
FILTER - Low call rate: rs6697886 - pos: 1163474 (0.25)
FILTER - Allele mismatch: rs12563338 - pos: 1188481 (ref: G/A, data: T/A)
......
chunk_1_0000000001_0010000000 (Snps: 4158, Reference overlap: 0.017069701280227598, low sample call rates: false)
chunk_1_0010000001_0020000000 (Snps: 4547, Reference overlap: 0.018189692507579038, low sample call rates: false)
chunk_1_0020000001_0030000000 (Snps: 4094, Reference overlap: 0.01352657004830918, low sample call rates: false)
chunk_1_0030000001_0040000000 Sample NA06985: call rate: 0.49807037457434733
chunk_1_0030000001_0040000000 (Snps: 4405, Reference overlap: 0.012578616352201259, low sample call rates: true)
chunk_1_0040000001_0050000000 (Snps: 3991, Reference overlap: 0.016057312252964428, low sample call rates: false)
chunk_1_0050000001_0060000000 Sample NA06985: call rate: 0.4794905008635579
chunk_1_0050000001_0060000000 (Snps: 4632, Reference overlap: 0.01344717182497332, low sample call rates: true)
chunk_1_0060000001_0070000000 (Snps: 4885, Reference overlap: 0.017154389505549948, low sample call rates: false)
chunk_1_0070000001_0080000000 (Snps: 3948, Reference overlap: 0.014024542950162784, low sample call rates: false)
chunk_1_0080000001_0090000000 (Snps: 4674, Reference overlap: 0.013550709294939657, low sample call rates: false)
chunk_1_0090000001_0100000000 (Snps: 4589, Reference overlap: 0.01058543961978829, low sample call rates: false)
chunk_1_0100000001_0110000000 (Snps: 3942, Reference overlap: 0.016504126031507877, low sample call rates: false)
chunk_1_0110000001_0120000000 (Snps: 4830, Reference overlap: 0.014309076042518397, low sample call rates: false)
chunk_1_0120000001_0130000000 Sample NA06985: call rate: 0.4512820512820513
chunk_1_0120000001_0130000000 Sample NA07346: call rate: 0.47692307692307695
chunk_1_0120000001_0130000000 Sample NA12145: call rate: 0.48205128205128206
chunk_1_0120000001_0130000000 Sample NA12287: call rate: 0.47692307692307695
chunk_1_0120000001_0130000000 Sample NA12751: call rate: 0.49230769230769234
chunk_1_0120000001_0130000000 Sample NA12843: call rate: 0.4717948717948718
chunk_1_0120000001_0130000000 (Snps: 195, Reference overlap: 0.01015228426395939, low sample call rates: true)
chunk_1_0140000001_0150000000 (Snps: 1360, Reference overlap: 0.002932551319648094, low sample call rates: false)
chunk_1_0150000001_0160000000 (Snps: 4526, Reference overlap: 0.01504907306434024, low sample call rates: false)
chunk_1_0160000001_0170000000 (Snps: 5688, Reference overlap: 0.013368055555555555, low sample call rates: false)
chunk_1_0170000001_0180000000 (Snps: 4290, Reference overlap: 0.011305952930318412, low sample call rates: false)
chunk_1_0180000001_0190000000 (Snps: 4107, Reference overlap: 0.01516610495907559, low sample call rates: false)
chunk_1_0190000001_0200000000 (Snps: 4062, Reference overlap: 0.014111922141119221, low sample call rates: false)
chunk_1_0200000001_0210000000 (Snps: 5175, Reference overlap: 0.01111963190184049, low sample call rates: false)
chunk_1_0210000001_0220000000 (Snps: 4956, Reference overlap: 0.012385137834598482, low sample call rates: false)
chunk_1_0220000001_0230000000 Sample NA06985: call rate: 0.4813989752728893
chunk_1_0220000001_0230000000 (Snps: 4489, Reference overlap: 0.011032656663724626, low sample call rates: true)
chunk_1_0230000001_0240000000 (Snps: 5860, Reference overlap: 0.01815126050420168, low sample call rates: false)
chunk_1_0240000001_0250000000 (Snps: 3314, Reference overlap: 0.017533432392273403, low sample call rates: false)
I don't quite understand the Invalid alleles: 62,461
. It seems that I need clean up the raw data, but I think that will lost 62,461 of 164386 SNPs.
What should I do now? Any help would be greatly appreciated
Hello,
I had the same problem today. The reference overlap is very low, around 1.7 %. No hint of what I happened. Anyone with some tips? Did you try the script shared by Vince? I'm sure that the data is hg19.
I'm new in Imputation
Thanks in advance
Hi,
I had the same problem today. My reference overlap is 1.74 %. How did you solve the problem?Could you please give me some advice?
Thank you very much.
Hi, I had the same problem right now! Did you manage to get it to work?
Hi ,I have the same problem now,Did you solve it ?