Hi, I have a phased beagle file which I generated through Angsd v0.935. I would like to use the beagle utility program beagle2vcf.jar.
However I keep getting this error:
java -jar beagle2vcf.jar rs markers bgl_comb ? > vcf
Exception in thread "main" java.lang.IllegalArgumentException: Alleles in Beagle and markers file are inconsistent for allele "0" and marker rs1
at beagleutil.Beagle2Vcf.alleleCode(Beagle2Vcf.java:169)
at beagleutil.Beagle2Vcf.main(Beagle2Vcf.java:69)
When I check the beagle and markers file the alleles look consistent. Here is the markers file:
rs1 A G
rs2 T C
rs3 A G
and the phased beagle file:
I id Ind0 Ind0 Ind1
M rs1 0 0 0
M rs2 3 3 3
M rs3 0 0 2
M rs4 1 1 1
Here is what the manual states:
usage: java -jar beagle2vcf.jar [chrom] [markers] [bgl] [missing] > [vcf]
where [chrom] = chromosome identifier in output VCF file.
[markers] = Beagle version 3 markers file. [bgl] = Beagle version 3 genotypes file. [missing] = missing allele code in Beagle genotypes file. [vcf] = output VCF file with a GT FORMAT field for each marker.Markers in the markers file and Beagle genotypes file must be identical and sorted in order of increasing position. The first allele for a marker in the markers file will be the REF allele in the output VCF file. Alleles in the markers file must contain only 'A', 'C', 'G', and 'T' characters
Can anyone spot what is inconsistent in my files?
Thanks,
James
Hi James,
I've been trying to do the same and after exploring the source code within beagle2vcf.jar I've noticed that it doesn't work for genotype likelihoods or genotypes coded as 0, 1, 2, and 3 like ANGSD-style. Instead, it's expecting A, C, T, or G. For example, your phased beagle file looks like this:
but it's expecting something like:
Not very elegant, but you can easily recode your BEAGLE file by doing something like this:
Recode numbers as nucleotides
cat bgl_comb | cut -f1,2,3 --complement | tr '0' 'A' | tr '1' 'C' | tr '2' 'T' | tr '3' 'G' | tail -n +2 > tmp
Generate header
head -n 1 bgl_comb > header
Save the three first columns (markers, alleleA and alleleB) in a new file
cat bgl_comb | cut -f1,2,3 | tail -n +2 > tmp1
Paste three first columns to recoded file:
paste tmp1 tmp > tmp2
Concatenate header and rest of file
cat header tmp2 > final_beagle
Tidy up
rm tmp* header
Hope this is useful!
Andrea