Hello,
I need to use SHAPEIT for phasing only since I will conduct CH (compound heterozygous) analysis for recessive rare variant. I will not perform imputation.
I am running SHAPEIT, and I see in the log file it says:
Parameters :
* Seed : 1442251531
* Parallelisation: 12 threads
* Ref allele is NOT aligned on the reference genome
* MCMC: 35 iterations [7 B + 1 runs of 8 P + 20 M]
I am still able to get *haps file for haplotypes for CH, however, I am not sure if I am doing correctly.
Is it ok to have "Ref allele is NOT aligned on the reference genome" notice on my log file?
I have one more question.
My input file is plink PED/MAP format, and on the SHAPEIT website (http://shapeit.fr/pages/m03_phasing/input.html), it says that SHAPEIT considers "0" as missing data.
And they suggested people to change the missing data character to "N" for example, use --missing-code options as follows:
shapeit --input-ped chr20.unphased.ped chr20.unphased.map -M chr20.gmap.gz --output-max chr20.phased --missing-code N
However, --missing-code N
gives me an error ERROR: Non biallelic site pos=24118582 a=0
So, I did not use --missing-code N
and run SHAPEIT:
shapeit --input-ped chr20.unphased.ped chr20.unphased.map -M chr20.gmap.gz --output-max chr20.phased
Would that be ok?
Thank you so much
It might mean that not all of your panel/reference alleles were used. This might be because your plink files are not all on the reference strand, see https://github.com/endrebak/snpflip for a solution (if this is the problem).
Your title is too general btw.