unable to impute with beagle after filtering VCF table using vcftools
1
0
Entering edit mode
3.7 years ago
ziv_attia • 0

For a reason i can't really understand I am not able to impute a vcf with beagle only after filtering it with vcftools. the filtering is really stright forward and i have used it hundreds of times. this is how I filter the vcf

vcftools --gzvcf ${path}${file}.vcf.gz --remove-indels --max-missing ${maxM} --maf ${maf} --minQ ${minQ} --out ${path}${file}_filterd_minQ${minQ}_maxM${maxM}_maf${maf} --recode #--recode-INFO-all

and this is how I run beagle:

java -Xmx144g -jar /home/pogoda/software/BEAGLE/beagle.03Jul19.b33.jar gt=${file} nthreads=36  out=IMPUTED_${file}

I can impute the unfiltered table with no problems so it must be something with the filtering. any idea what can be the issue with vcftools?

genomics bioingormatics vcftools beagle • 2.7k views
ADD COMMENT
1
Entering edit mode

Is there an error message from beagle?

ADD REPLY
0
Entering edit mode

this the error i get

ERROR: genotype is missing allele separator:

ADD REPLY
0
Entering edit mode

Okay, what does an example line from your vcf file look like then? It seems to suggest you're missing a field

ADD REPLY
0
Entering edit mode

What do you mean that you are 'unable to do it'?

ADD REPLY
0
Entering edit mode

it will crush immediately after i start running it

ADD REPLY
0
Entering edit mode
3.5 years ago
roselaw27 • 0

If your error message is something similar like this:

java.lang.IllegalArgumentException: ERROR: inconsistent number of alleles for sample LH05 at marker [1 1088185 . A G]

then we've run into the same problem. I realized that the vcftools filtering is omitting genotype information, that's why beagle can't recognize the alleles. To be more specific, I extracted the line of chr1, 223216, and my diploid sample LH05 had

.:0,0:.:.:0|1:1088185_A_G:.:1088185

where the others (the normal ones) had something like:

0|1:2,4:6:72:0|1:1088185_A_G:162,0,72:1088185

0/0:14,0:14:42:.:.:0,42,390:.

./.:1,2:3:.:.:.:0,0,0:. (the first item separated by : is the genotype info, should be two of them because I have diploid samples)

I checked my files and found it happened as a single 1 as well.

The reason is I used vcftools filter (maf) to process results from GATK VariantFiltration step. This is actually not my first time discovered this problem with vcftools (last time I used --min-alleles and --max-alleles). That's why your un-vcftools-filtered vcf runs smoothly with beagle. I don't understand why other software never caught this error, probably because they regard .as ./. or 0/0 and continued anyway. This could be a problem if your statistic is sensitive to missing alleles.

Anyway, if people are using vcftools for filtering, PLEASE CHECK your results.

ADD COMMENT
0
Entering edit mode

my beagle is beagle.25Nov19.28d.jar

ADD REPLY
0
Entering edit mode

I found same error during the usage of recode.vcf fie. Even I tried to change . to ./. then recide again it showed same error. I used gt=filename out=imputed.vcf. Kindly guide me to put the paramerters for filtering minor allele frequency, -minQ. Any otherwaay!

ADD REPLY

Login before adding your answer.

Traffic: 1505 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6