Understanding Phasing with ShapeIt
0
0
Entering edit mode
2.0 years ago
nanodano ▴ 30

Hi everyone,

I'm currently trying to rephase data using a file that has 2 merged datasets: (1) my phased parent-offsprings from the first round of phasing and (2) the original reference panel (African Genome Diversity Project).

When I rerun shapeit with the same steps and flags previously used, I get the following (check code block below). Does anyone know why it would complete? I've gone through the process of removing SNPs that are mismatched, flipping strands where needed, etc. The input should be clean and ready to go, so I suspect it might be the newly merged reference panel. Any help is appreciated, thanks.

shapeit -B ${INPUT}_endsTrimmed -M ${MAP} --input-ref ${REF_HAP} ${REF_LEG} ${REF_SAMP} --duohmm -W 5 -O ${OUTPUT} --output-log ${LOG}.log -T 54


    Segmented HAPlotype Estimation & Imputation Tool
      * Authors : Olivier Delaneau, Jared O'Connell, Jean-François Zagury, Jonathan Marchini
      * Contact : send an email to the OXSTATGEN mail list https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=OXSTATGEN
      * Webpage : https://mathgen.stats.ox.ac.uk/shapeit
      * Version : v2.r837
      * Date    : 18/11/2022 10:40:28
      * LOGfile : [shapeit_log/qc_ogr_kro_nctb_cbd.cleansamples_chr22.phased.log]

    MODE -phase : PHASING GENOTYPE DATA
      * Autosome (chr1 ... chr22)
      * Window-based model (SHAPEIT v2)
      * Reference panel of haplotypes used
      * MCMC iteration
      * duoHMM is used to refine family haplotypes

    Parameters :
      * Seed : 1668796828
      * Parallelisation: 4 threads
      * Ref allele is NOT aligned on the reference genome
      * MCMC: 35 iterations [7 B + 1 runs of 8 P + 20 M]
      * Model: 100 states per window [100 H + 0 PM + 0 R + 0 COV ] / Windows of ~5.0 Mb / Ne = 15000

    Reading site list in [prepare/chroms/qc_ogr_kro_nctb_cbd.cleansamples.chr22_endsTrimmed_misalignRepaired.bim]
      * 21634 sites included

    Reading sample list in [prepare/chroms/qc_ogr_kro_nctb_cbd.cleansamples.chr22_endsTrimmed_misalignRepaired.fam]
      * 663 samples included
      * 601 unrelateds / 25 duos / 5 trios in 628 different families

    Reading genotypes in [prepare/chroms/qc_ogr_kro_nctb_cbd.cleansamples.chr22_endsTrimmed_misalignRepaired.bed]
      * Plink binary file SNP-major mode

    Reading sample list [/share/hennlab/projects/sa_ponderosa/rephaseSA/mergePOandAGR/hapslegendsample/mergedPOandAGR.sample]
      * Column gender detected (idx=3)
      * 10100 reference haplotypes included
      * 0 males / 5050 females

    Reading SNPs in [/share/hennlab/projects/sa_ponderosa/rephaseSA/mergePOandAGR/hapslegendsample/mergedPOandAGR_chr22.legend.gz]
      * 43268 reference panel sites included

    ERROR: Reference and Main panels are not well aligned:
      * #Missing sites in reference panel = 0
      * #Misaligned sites between panels = 0
      * #Multiple alignments between panels = 21634
reference phasing shapeit2 shapeit phase • 1.1k views
ADD COMMENT
0
Entering edit mode

Just another note, the merging of data looked like this:

filter AGR down to h3Africa snps

vcftools --gzvcf /share/hennlab/data/genomes/AGR/haplegendsample/out_haplegendsample2vcf/AGR.without_related_chr1.vcf.gz --positions h3africa_positions_2col.txt --out AGR.without_related_h3A_chr1

h3africa_positions_2col.txt was used to slim down WG data to only the SNP data off Illumina H3Africa array (contains chr and position info)

Using the following bcftools to merge by ID:

bcftools merge -m id $PO/SA_hg37.chr${no}.recode.vcf.gz $AGR/AGR.without_related_h3A_chr${no}.recode.vcf.gz --output $OUT/mergedPOandAGR.chr${no}.vcf.gz

Using the following bcftools to convert merged vcf to HAP/LEGEND/SAMPLE files

bcftools convert mergedPOandAGR.chr${no}.vcf.gz --haplegendsample $HAPS/mergedPOandAGR.chr${no}
ADD REPLY

Login before adding your answer.

Traffic: 2231 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6