Hi, I have PLINK format data (PED/MAP) and I wanted to convert this to VCF so that I can input it in BEAGLE 4.1 to phase them, as BEAGLE only use VCF format. I wanted a trivial one line solution and not a pipeline using PSEQ or MEGA2, etc.
I saw in PLINK1.9 one can just use --recode vcf
to achieve this. However when I did this and ran beagle (gt) on the input its giving me Java exceptions/errors. Its not a problem with beagle jar file as it runs well with the sample VCF format data downloaded from 1000Genomes. However, when I convert the data to VCF using PLINK and then use it as BEAGLE 4.1 input, then it doesn't like it. It'd be great if anyone can help me with this, such as, if there's any workaround, other simplistic methods to convert PLINK to VCF for BEAGLE input.
Error snippet:
Exception in thread "main" java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: nSamples==0
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:593)
at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at h.G.c(Unknown Source)
at h.G.a(Unknown Source)
at main.Main.main(Unknown Source)
Caused by: java.lang.IllegalArgumentException: nSamples==0
at h.I.<init>(Unknown Source)
at h.e.<init>(Unknown Source)
at h.G.a(Unknown Source)
at h.G.a(Unknown Source)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747)
at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721)
at java.util.stream.AbstractTask.compute(AbstractTask.java:316)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
To give a description of what I am doing to convert PLINK to VCF:
- Converting PLINK to .bgl using PLINK 1.9
- Converting .bgl to vcf using beagle2vcf.jar
- post processing to make it tab separated.
- running Beagle 4.1 only to get the aforementioned error.
Thanks,
Aritra
Can you post the errors?
Hi Zev,
Added the errors.
Thanks.
Have you tried running --list-duplicate-vars, and then using
--exclude
on the listed variant IDs before exporting a VCF?Also, what do the first few non-header lines of the VCF look like?
Thanks for the input, Christopher. I did --list-duplicate-vars but for my particular dataset it didn't return any duplicate variants (I do get dupvars for other datasets which I am not using currently) The non-header lines of VCF file (after --recode vcf) looks like this:
the header lines look like this:
The error that I get in Beagle 4 when I am using the
--recode-vcf
file is this:It'd be great if you can help me with this.
Thanks.
I'm running into the same issue. Have you resolved this or found another work-around?
Have run into the same problem. Did you ever find a solution to it?