I followed the code on ShapeIt's website (http://www.shapeit.fr/pages/m03_phasing/imputation.html) to check strand alignment, and the results don't make sense, with the vast majority of the SNPs being listed as missing in the reference panel. The output is below in case it's helpful. I confirmed they're both hg37 build, and I'm at a loss as to what else to try. Any suggestions would be greatly appreciated!!
Segmented HAPlotype Estimation & Imputation Tool * Authors : Olivier Delaneau, Jared O'Connell, Jean-François Zagury, Jonathan Marchini * Contact : send an email to the OXSTATGEN mail list https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=OXSTATGEN * Webpage : https://mathgen.stats.ox.ac.uk/shapeit * Version : v2.r837 * Date : 08/01/2018 22:13:07 * LOGfile : [180108_chr10.alignments.log]
MODE -summarise : GENERATING SUMMARY STATISTICS OF THE INPUT DATA * Autosome (chr1 ... chr22) * Reference panel of haplotypes used
Parameters : * Seed : 1515467587 * Parallelisation: 1 threads * Ref allele is NOT aligned on the reference genome
Reading site list in [lifted010818.10.map] * 25649 sites included
Reading sample list and genotypes in [lifted010818.10.ped] where missing-code = [0] * 4028 samples included * 4028 unrelateds / 0 duos / 0 trios in 4028 different families
Reading sample list [/ysm-gpfs/datasets/genomes/1000Genomes/1000GP_Phase3/1000GP_Phase3/1000GP_Phase3.sample] * 5008 reference haplotypes included
Reading SNPs in [/ysm-gpfs/datasets/genomes/1000Genomes/1000GP_Phase3/1000GP_Phase3/1000GP_Phase3_chr10.legend.gz] * 224 reference panel sites included * 4013234 reference panel sites excluded
ERROR: Reference and Main panels are not well aligned: * #Missing sites in reference panel = 24873 * #Misaligned sites between panels = 552 * #Multiple alignments between panels = 0
Have you solved the problem? I also encountered the same problem. look forward to your reply
I have been trying to solve this same issue for about 2 weeks now. I have done some troubleshooting and the positions in my sample file are in the haplotype reference panel. I can’t figure how my reference and main panels can be misaligned when they have 80% of the same sites between the two of them (according to my tests with my own scripts).
I had a data set with unknown human genome assembly. I figure out it by intersecting the positions taken from dbSNP130 (hg18) and dbSNP150 (hg19). With incorrect assembly only 0.9% of positions were in common. According to SHAPEIT -check, you have 0.9% of SNPs in reference. Why not to double check the assemblies of data and reference panel?