Beagle imputation - duplication position error
2
3
Entering edit mode
6.5 years ago
valopes ▴ 30

Hi all,

So, after a figured how to extract a .vcf from an Illumina data [C: Getting a .vcf file from a Illumina SNPChip results (.bsc file)], now I am facing problems to filter and do the imputation.

As I never did that before, I've been trying a lot of options but with no success. So, let me explain it... It is a long story...

From a .vcf file, I used the command to filter:

vcftools --vcf input.vcf --remove-filtered-all --max-missing 0.2 --maf 0.05 --mac 1 --min-alleles 2 --max-alleles 2 --recode --out output_filtered.vcf

After that, I've tried to do the imputation using Beagle (beagle.16May18.771.jar):

java -Xmx25000m -jar beagle.16May18.771.jar gt=input.vcf out=output_imputed

but I got an error:

No genetic map is specified: using 1 cM = 1 Mb Exception in thread "main" java.lang.IllegalArgumentException: Duplicate marker:
0 1556719 Gm20_1556719_C_T G A at vcf.Markers.markerSet(Markers.java:175) at vcf.Markers.<init>(Markers.java:92) at vcf.Markers.create(Markers.java:69) at vcf.TargetData.extractMarkers(TargetData.java:130) at vcf.TargetData.advanceWindowCm(TargetData.java:120) at vcf.TargetData.targetData(TargetData.java:76) at main.Main.data(Main.java:143) at main.Main.main(Main.java:115)

So, I thought that I should create a .map file for the filtered .vcf, using PLINK:

plink --vcf input_filtered.vcf --recode --out output_plink_files

Then, I've run the Beagle again:

java -Xmx25000m -jar beagle.16May18.771.jar gt=input.vcf map=input_vcf.map out=output_imputed

and I've got:

Exception in thread "main" java.lang.IllegalArgumentException: duplication posit ion: 0 Gm20_1556719_C_T 0
1556719 at vcf.PlinkGenMap.fillMapPositions(PlinkGenMap.java:76) at vcf.PlinkGenMap.<init>(PlinkGenMap.java:53) at vcf.PlinkGenMap.fromPlinkMapFile(PlinkGenMap.java:117) at vcf.GeneticMap.geneticMap(GeneticMap.java:120) at vcf.TargetData.targetData(TargetData.java:71) at main.Main.data(Main.java:143) at main.Main.main(Main.java:115)

Well, reading I could see that the genetic map is not the problem but I cannot figure the duplication posit out. The thing is, I am quite lost here. Could someone help me?

Oh I also found this post [Can someone help me with imputation of missing SNPs using beagle 4?] But still didn't work for me...

Thanks in advance!

SNP • 5.8k views
ADD COMMENT
0
Entering edit mode

Could you search in the vcf for that position Gm20_1556719_C_T, for example using grep? I'm not sure how the SNP and chromosome identifiers are in your vcf file, you may have to search for 1556719 separately.

ADD REPLY
0
Entering edit mode

Yes, I did seach already! And it looks duplicate...

Line 1762:
0   1556719 Gm05_1556719_C_T    G   A   .   .   .   GT  0/0 0/0 0/0 1/1 1/1 0/0 
Line 1763:
0   1556719 Gm20_1556719_C_T    G   A   .   .   .   GT  0/1 0/0 0/1 0/1 0/0 0/1

and I know this position is not the only one duplicate.

ADD REPLY
1
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

It looked like you pasted the same twice, is the output correct as I formatted it?

ADD REPLY
0
Entering edit mode
6.5 years ago

plink’s —list-duplicate-vars flag was created specifically to address this Beagle 4 issue.

ADD COMMENT
0
Entering edit mode

Okay I've tried this:

 --list-duplicate-vars

and then

 --exclude

It didn't work.

So, I did:

 --write-snplist

cat input.snplist | sort | uniq -d > output_new.snplist

--exclude

It still not working...

Exception in thread "main" java.lang.IllegalArgumentException: Duplicate marker: 0 1556719 Gm20_1556719_C_T G A at vcf.Markers.markerSet(Markers.java:175) at vcf.Markers.<init>(Markers.java:92) at vcf.Markers.create(Markers.java:69) at vcf.TargetData.extractMarkers(TargetData.java:130) at vcf.TargetData.advanceWindowCm(TargetData.java:120) at vcf.TargetData.targetData(TargetData.java:76) at main.Main.data(Main.java:143) at main.Main.main(Main.java:115)

ADD REPLY
0
Entering edit mode
3.8 years ago

Hi! I have the same issue as you, can I ask how did you managed to solve it? Many thanks!!

This is how it looks for me: Exception in thread "main" java.lang.IllegalArgumentException: Duplicate marker: 1 59409838 ARS-USMARC-Parent-DQ404150-rs29012530_dup T C at vcf.Markers.markerSet(Markers.java:131) at vcf.Markers.<init>(Markers.java:85) at vcf.Markers.create(Markers.java:64) at vcf.BasicGT.markers(BasicGT.java:105) at vcf.BasicGT.<init>(BasicGT.java:86) at vcf.TargetData.targGT(TargetData.java:92) at vcf.TargetData.advanceWindowCm(TargetData.java:120) at main.Main.phaseData(Main.java:158) at main.Main.main(Main.java:113)

ADD COMMENT
0
Entering edit mode

Did you try to use plink --list-duplicate-vars followed by --exclude? This should work if you use them correctly.

ADD REPLY

Login before adding your answer.

Traffic: 1705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6