Hi all,
So, after a figured how to extract a .vcf from an Illumina data [C: Getting a .vcf file from a Illumina SNPChip results (.bsc file)], now I am facing problems to filter and do the imputation.
As I never did that before, I've been trying a lot of options but with no success. So, let me explain it... It is a long story...
From a .vcf file, I used the command to filter:
vcftools --vcf input.vcf --remove-filtered-all --max-missing 0.2 --maf 0.05 --mac 1 --min-alleles 2 --max-alleles 2 --recode --out output_filtered.vcf
After that, I've tried to do the imputation using Beagle (beagle.16May18.771.jar):
java -Xmx25000m -jar beagle.16May18.771.jar gt=input.vcf out=output_imputed
but I got an error:
No genetic map is specified: using 1 cM = 1 Mb Exception in thread "main" java.lang.IllegalArgumentException: Duplicate marker:
0 1556719 Gm20_1556719_C_T G A at vcf.Markers.markerSet(Markers.java:175) at vcf.Markers.<init>(Markers.java:92) at vcf.Markers.create(Markers.java:69) at vcf.TargetData.extractMarkers(TargetData.java:130) at vcf.TargetData.advanceWindowCm(TargetData.java:120) at vcf.TargetData.targetData(TargetData.java:76) at main.Main.data(Main.java:143) at main.Main.main(Main.java:115)
So, I thought that I should create a .map file for the filtered .vcf, using PLINK:
plink --vcf input_filtered.vcf --recode --out output_plink_files
Then, I've run the Beagle again:
java -Xmx25000m -jar beagle.16May18.771.jar gt=input.vcf map=input_vcf.map out=output_imputed
and I've got:
Exception in thread "main" java.lang.IllegalArgumentException: duplication posit ion: 0 Gm20_1556719_C_T 0
1556719 at vcf.PlinkGenMap.fillMapPositions(PlinkGenMap.java:76) at vcf.PlinkGenMap.<init>(PlinkGenMap.java:53) at vcf.PlinkGenMap.fromPlinkMapFile(PlinkGenMap.java:117) at vcf.GeneticMap.geneticMap(GeneticMap.java:120) at vcf.TargetData.targetData(TargetData.java:71) at main.Main.data(Main.java:143) at main.Main.main(Main.java:115)
Well, reading I could see that the genetic map is not the problem but I cannot figure the duplication posit out. The thing is, I am quite lost here. Could someone help me?
Oh I also found this post [Can someone help me with imputation of missing SNPs using beagle 4?] But still didn't work for me...
Thanks in advance!
Could you search in the vcf for that position
Gm20_1556719_C_T
, for example using grep? I'm not sure how the SNP and chromosome identifiers are in your vcf file, you may have to search for1556719
separately.Yes, I did seach already! And it looks duplicate...
and I know this position is not the only one duplicate.
I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:
It looked like you pasted the same twice, is the output correct as I formatted it?