beagle 4 exception: Duplicate marker
2
1
Entering edit mode
10.0 years ago
galbarel ▴ 10

I'm trying to use BEAGLE 4 to do an IBD analysis.

I'm running the command: java -jar beagle.r1398.jar gt=vcf_format.vcf out=beagle ibd=true

where vcf_format.vcf is a file I generated using PLINK 1.9 (with --recode vcf-iid) from my PLINK format files (bed, bim, fam).

BEAGLE stops with an exception:

Duplicate marker: 1    72765116    rs2568958_r    A    G

I've searched for this marker in both the VCF file and the BIM file, and it appears only once in both.

Can anyone suggest what to do? Or how can I check differently for duplicates in my files?

Thanks

vcf IBD plink beagle • 5.9k views
ADD COMMENT
3
Entering edit mode
10.0 years ago

BEAGLE 4's marker equality check is based on chromosome, position, alleles, and the INFO field's END key if one is present; it does not consider marker ID. (For the curious, this is in src/vcf/Marker.java lines 473-509.) So you probably have another marker in your file with the same chrom/pos/alleles, and these could very well be genuine duplicates that you want to resolve with e.g. PLINK --exclude.

I will add a --detect-duplicate-var flag to PLINK today or tomorrow to help out here.

ADD COMMENT
0
Entering edit mode

Implemented as --list-duplicate-vars in the December 9 development build. Sample workflow, if you don't have multiple variants with the same ID:

plink --bfile my_dataset --list-duplicate-vars ids-only suppress-first --out tmp
plink --bfile my_dataset --exclude tmp.dupvar --recode vcf-iid --out vcf_format
java -jar beagle.r1398.jar gt=vcf_format.vcf ...
ADD REPLY
1
Entering edit mode

Thank you chrchang523.

This is R script ready to run.

# Set working directory
setwd("YOUR WORKING DIRECTORY")

# Function to run plink
runPLINK <- function(PLINKoptions = "") system(paste("plink directory/plink.exe", PLINKoptions))
runPLINK()

dir.create("results")

runPLINK("--vcf VCF_file.vcf --list-duplicate-vars ids-only suppress-first --allow-extra-chr --out results/tmp --memory 16000")
runPLINK("--vcf VCF_file.vcf --exclude results/tmp.dupvar --recode vcf-iid --allow-extra-chr --out results/vcf_format")
ADD REPLY
0
Entering edit mode
10.0 years ago
Jie Ping ▴ 40

I am not an expert on this. But I have read a paper whick did an IBD analysis using PLINK directly.

I am confused that why not do an IBD analysis using PLINK directly?

ADD COMMENT
0
Entering edit mode

You are correct, there is an option to do IBD analysis with PLINK, but I have been suggested to use BEAGLE's fastIBD algorithm since it is faster and might have better results.

ADD REPLY

Login before adding your answer.

Traffic: 1739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6