convert beagle genotypes to vcf
0
1
Entering edit mode
2.0 years ago
bsp017 ▴ 50

Hi, I have a phased beagle file which I generated through Angsd v0.935. I would like to use the beagle utility program beagle2vcf.jar.

However I keep getting this error:

 java -jar beagle2vcf.jar rs markers bgl_comb ? > vcf
Exception in thread "main" java.lang.IllegalArgumentException: Alleles in Beagle and markers file are inconsistent for allele "0" and marker rs1
    at beagleutil.Beagle2Vcf.alleleCode(Beagle2Vcf.java:169)
    at beagleutil.Beagle2Vcf.main(Beagle2Vcf.java:69)

When I check the beagle and markers file the alleles look consistent. Here is the markers file:

rs1 A G
rs2 T C
rs3 A G

and the phased beagle file:

I id Ind0 Ind0 Ind1
M rs1 0 0 0
M rs2 3 3 3
M rs3 0 0 2
M rs4 1 1 1

Here is what the manual states:

usage: java -jar beagle2vcf.jar [chrom] [markers] [bgl] [missing] > [vcf]

where [chrom] = chromosome identifier in output VCF file.
[markers] = Beagle version 3 markers file. [bgl] = Beagle version 3 genotypes file. [missing] = missing allele code in Beagle genotypes file. [vcf] = output VCF file with a GT FORMAT field for each marker.

Markers in the markers file and Beagle genotypes file must be identical and sorted in order of increasing position. The first allele for a marker in the markers file will be the REF allele in the output VCF file. Alleles in the markers file must contain only 'A', 'C', 'G', and 'T' characters

Can anyone spot what is inconsistent in my files?

Thanks,

James

vcf beagle • 1.7k views
ADD COMMENT
1
Entering edit mode

Hi James,

I've been trying to do the same and after exploring the source code within beagle2vcf.jar I've noticed that it doesn't work for genotype likelihoods or genotypes coded as 0, 1, 2, and 3 like ANGSD-style. Instead, it's expecting A, C, T, or G. For example, your phased beagle file looks like this:

I id Ind0 Ind0 Ind1
M rs1 0 0 0
M rs2 3 3 3
M rs3 0 0 2
M rs4 1 1 1

but it's expecting something like:

I id Ind0 Ind0 Ind1
M rs1 A A A
M rs2 G G G
M rs3 A A T
M rs4 C C C

Not very elegant, but you can easily recode your BEAGLE file by doing something like this:

  1. Recode numbers as nucleotides

    cat bgl_comb | cut -f1,2,3 --complement | tr '0' 'A' | tr '1' 'C' | tr '2' 'T' | tr '3' 'G' | tail -n +2 > tmp

  2. Generate header

    head -n 1 bgl_comb > header

  3. Save the three first columns (markers, alleleA and alleleB) in a new file

    cat bgl_comb | cut -f1,2,3 | tail -n +2 > tmp1

  4. Paste three first columns to recoded file:

    paste tmp1 tmp > tmp2

  5. Concatenate header and rest of file

    cat header tmp2 > final_beagle

  6. Tidy up

    rm tmp* header

Hope this is useful!

Andrea

ADD REPLY

Login before adding your answer.

Traffic: 2165 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6