Beagle Gives Error no matter what i do
0
0
Entering edit mode
9 weeks ago
Emilie ▴ 10

I am trying to run genome imputation, so i have my input file and then my phased reference.

I run the following code:

java -jar ${EBROOTBEAGLE}/beagle.jar gt=chr4.vcf.gz ref=Chr4_phased_snp.vcf.gz out=chr4_imputed 

And get the following error

ERROR: REF field is not a sequence of A, C, T, G, or N characters at 4:116707718 [D]

So I trouble shoot, I filter my vcf file so it is only SNPs (so no indels), also try to filter out any sites that do not have A,C,T,G, or N with the following code

bcftools view -i 'REF ~ "^[ACGTN]$"' input.vcf -o filtered_output.vcf

Run it again and get the same thing, so i say hmmm let me look at this specific site within my reference. Well, it does not exist. So just to be safe I run this to get rid of singletons

bcftools view -m2 -M2 -v snps -o output_no_missing.vcf -O v input.vcf.gz

Run beagle again and get the same error, so i say okay lets just use exclude sites with 4:116707718 because it is the problem child so i run

java -jar ${EBROOTBEAGLE}/beagle.jar gt=chr4.vcf.gz ref=Chr4_phased_snp.vcf.gz out=chr4_imputed excludemarkers=file.txt

Where the file has the marker site. I still get the same error. I am out of trouble shooting ideas... It took me forever to find a suitable reference, so I really can't afford to just go find a new one; plus, I am doing this by chromosome, and chromosomes 1-3 worked just fine.

Please help... Please :(

Beagle Impute SNPs • 586 views
ADD COMMENT
1
Entering edit mode

you have some answers to validate: Indexing error using Salmon

ADD REPLY
0
Entering edit mode

Run it again and get the same thing, so i say hmmm let me look at this specific site within my reference. Well, it does not exist.

what is the output of

bcftools view --targets-overlap 2  --no-header -G --targets "4:116707716-116707720" Chr4_phased_snp.vcf.gz  | cut -f1-5
ADD REPLY
0
Entering edit mode

That returns nothing as well.

ADD REPLY
0
Entering edit mode
4:116707718 [D]

I suspect your VCF is pretty old, in the old days when a deletion was marked 'D'.

can you show me:

bcftools view-G --no-header Chr4_phased_snp.vcf.gz  | cut -f1-5 | awk -F '\t' '!($4~ /^[ATGC]*$/ && $5~ /^[ATGC]*$/)' |head
ADD REPLY
0
Entering edit mode

The header has the ##filedate=20230306 so I figured it was relatively recent.

That code also returns nothing. It is like beagle is hitting something that doesn't exist, which I know isn't possible but sure feels like it at the moment.

ADD REPLY
0
Entering edit mode

Yes, I saw these and used them for my original troubleshooting. My issue is when I search the vcf file for the location that is in the error, it does not exist in the file. I will keep working on it. Thank you.

ADD REPLY
0
Entering edit mode

Update, separated out the file for all sites before the 116707718. Confirmed that there was no 116707718 in the file. Ran beagle, got the same error. Its like there is a ghost. Please also note that I have moved on to other chromosomes and have ran beagle just fine with those files

ADD REPLY

Login before adding your answer.

Traffic: 2131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6