I am using Eage to phase plink input. I maintain --keep-allele-order throughout my workflow. Eagle outputs .haps (Oxford phased haplotype file), which I convert to VCF with shapeit. I did some sanity checks and find that this process reverses REF/ALT designation and genotypes for every single one of my snps and indels. Comparing alt allele frequency to reference strongly suggests that I do not have a build issue, so the workaround was initially to do
bcftools norm --check-ref -s -f hg19.fa input.vcf > output.vcf
to flip everything back after converting to vcf with shapeit, which perfectly flips every single one of my snps, but does not seem to do the same for indels. Is there a way to do this for indels as well. My alternative thought is just to use bcf as input for eagle, which should avoid the Plink/Oxford formats completely.
First off thank you.
I don't really get what you mean by import initial file incorrectly. I just checked again by converting my plink files I use as eagle input to VCF format by using:
I checked REF/ALT for every position in the resulting
{eagle_input_vcf}
compared to my very first VCF in my workflow (before ever going into plink). These REF/ALT all are 100% the order they should be, so I don't think I forgot--keep-allele-order
, although point taken.I used the
{eagle_input_plink}
files to phase with Eagle, which outputs Oxford format. I convert Oxford format to VCF using:From sanity checks I can tell that 100% of the REF/ALF alleles are flipped between
{eagle_phased.vcf}
and the test file{eagle_input_vcf}
I made above. Since the flip was perfectly systematic, I wasn't too worried about usingbcftools norm --check-ref -s
to flip everything back, but now I wonder if there is something deeper that I am missing. Do you have any recommendations?The only other thing I can think of is that either eagle or shapeit are making assumptions about the order of ref/alt in the input, which is sort of out of my control.
I alluded to the fact that Oxford format does not define whether REF is first or last. It looks like Eagle and Shapeit make different assumptions from each other.
Thanks, that was only something I was starting to think about after you made that comment. I am fairly niave about oxford format. I think I might end up just making a script to flip all the indels. I usually prefer the high level tool to avoid unintended corruption, but I might not have an option here.