Hi everyone,
I'm currently trying to rephase data using a file that has 2 merged datasets: (1) my phased parent-offsprings from the first round of phasing and (2) the original reference panel (African Genome Diversity Project).
When I rerun shapeit with the same steps and flags previously used, I get the following (check code block below). Does anyone know why it would complete? I've gone through the process of removing SNPs that are mismatched, flipping strands where needed, etc. The input should be clean and ready to go, so I suspect it might be the newly merged reference panel. Any help is appreciated, thanks.
shapeit -B ${INPUT}_endsTrimmed -M ${MAP} --input-ref ${REF_HAP} ${REF_LEG} ${REF_SAMP} --duohmm -W 5 -O ${OUTPUT} --output-log ${LOG}.log -T 54
Segmented HAPlotype Estimation & Imputation Tool
* Authors : Olivier Delaneau, Jared O'Connell, Jean-François Zagury, Jonathan Marchini
* Contact : send an email to the OXSTATGEN mail list https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=OXSTATGEN
* Webpage : https://mathgen.stats.ox.ac.uk/shapeit
* Version : v2.r837
* Date : 18/11/2022 10:40:28
* LOGfile : [shapeit_log/qc_ogr_kro_nctb_cbd.cleansamples_chr22.phased.log]
MODE -phase : PHASING GENOTYPE DATA
* Autosome (chr1 ... chr22)
* Window-based model (SHAPEIT v2)
* Reference panel of haplotypes used
* MCMC iteration
* duoHMM is used to refine family haplotypes
Parameters :
* Seed : 1668796828
* Parallelisation: 4 threads
* Ref allele is NOT aligned on the reference genome
* MCMC: 35 iterations [7 B + 1 runs of 8 P + 20 M]
* Model: 100 states per window [100 H + 0 PM + 0 R + 0 COV ] / Windows of ~5.0 Mb / Ne = 15000
Reading site list in [prepare/chroms/qc_ogr_kro_nctb_cbd.cleansamples.chr22_endsTrimmed_misalignRepaired.bim]
* 21634 sites included
Reading sample list in [prepare/chroms/qc_ogr_kro_nctb_cbd.cleansamples.chr22_endsTrimmed_misalignRepaired.fam]
* 663 samples included
* 601 unrelateds / 25 duos / 5 trios in 628 different families
Reading genotypes in [prepare/chroms/qc_ogr_kro_nctb_cbd.cleansamples.chr22_endsTrimmed_misalignRepaired.bed]
* Plink binary file SNP-major mode
Reading sample list [/share/hennlab/projects/sa_ponderosa/rephaseSA/mergePOandAGR/hapslegendsample/mergedPOandAGR.sample]
* Column gender detected (idx=3)
* 10100 reference haplotypes included
* 0 males / 5050 females
Reading SNPs in [/share/hennlab/projects/sa_ponderosa/rephaseSA/mergePOandAGR/hapslegendsample/mergedPOandAGR_chr22.legend.gz]
* 43268 reference panel sites included
ERROR: Reference and Main panels are not well aligned:
* #Missing sites in reference panel = 0
* #Misaligned sites between panels = 0
* #Multiple alignments between panels = 21634
Just another note, the merging of data looked like this:
filter AGR down to h3Africa snps
h3africa_positions_2col.txt was used to slim down WG data to only the SNP data off Illumina H3Africa array (contains chr and position info)
Using the following bcftools to merge by ID:
Using the following bcftools to convert merged vcf to HAP/LEGEND/SAMPLE files