I am trying to run genome imputation, so i have my input file and then my phased reference.
I run the following code:
java -jar ${EBROOTBEAGLE}/beagle.jar gt=chr4.vcf.gz ref=Chr4_phased_snp.vcf.gz out=chr4_imputed
And get the following error
ERROR: REF field is not a sequence of A, C, T, G, or N characters at 4:116707718 [D]
So I trouble shoot, I filter my vcf file so it is only SNPs (so no indels), also try to filter out any sites that do not have A,C,T,G, or N with the following code
bcftools view -i 'REF ~ "^[ACGTN]$"' input.vcf -o filtered_output.vcf
Run it again and get the same thing, so i say hmmm let me look at this specific site within my reference. Well, it does not exist. So just to be safe I run this to get rid of singletons
bcftools view -m2 -M2 -v snps -o output_no_missing.vcf -O v input.vcf.gz
Run beagle again and get the same error, so i say okay lets just use exclude sites with 4:116707718 because it is the problem child so i run
java -jar ${EBROOTBEAGLE}/beagle.jar gt=chr4.vcf.gz ref=Chr4_phased_snp.vcf.gz out=chr4_imputed excludemarkers=file.txt
Where the file has the marker site. I still get the same error. I am out of trouble shooting ideas... It took me forever to find a suitable reference, so I really can't afford to just go find a new one; plus, I am doing this by chromosome, and chromosomes 1-3 worked just fine.
Please help... Please :(
you have some answers to validate: Indexing error using Salmon
what is the output of
That returns nothing as well.
I suspect your VCF is pretty old, in the old days when a deletion was marked 'D'.
can you show me:
The header has the
##filedate=20230306
so I figured it was relatively recent.That code also returns nothing. It is like beagle is hitting something that doesn't exist, which I know isn't possible but sure feels like it at the moment.
duplicate of Error while running BEAGLE for genotype imputation and Imputing missing genotypes using Beagle
Yes, I saw these and used them for my original troubleshooting. My issue is when I search the vcf file for the location that is in the error, it does not exist in the file. I will keep working on it. Thank you.
Update, separated out the file for all sites before the 116707718. Confirmed that there was no 116707718 in the file. Ran beagle, got the same error. Its like there is a ghost. Please also note that I have moved on to other chromosomes and have ran beagle just fine with those files