I'm having trouble phasing a multi-sample (9-samples) vcf file produced by gatk HaplotypeCaller with Beagle 5.2. I do not have a genetic map or reference panel. I am working with a very heterozygous group of organisms (sea urchins). When I run beagle with the following command,
java -Xmx100g -jar beagle_5.2.jar gt=filtered_calls.vcf.gz out=phased impute=false
I get this error:
Window 470 [NW_022145483.1:11416-24979]
Reference markers: 630
Study markers: 630
Burnin iteration 1: 0 seconds
Exception in thread "main" java.lang.IllegalArgumentException: 0
at main.RunStats.printEstimatedNe(RunStats.java:260)
at main.Main.phaseStage1Variants(Main.java:195)
at main.Main.phaseTarg(Main.java:181)
at main.Main.phaseAndImpute(Main.java:171)
at main.Main.main(Main.java:126
Beagle usually runs fine for about 15 min and outputs 1.4G of phased genotypes, then crashes. I'm not sure what this error code means. I have been playing around with memory usage and window size (anywhere between 5 and 40). Neither has seemed to help. When I change the window size, it crashes while processing different scaffolds. I'm fairly sure that there isn't a problem with my input vcf file as other programs have run successfully using it as input.
Are there other programs that will phase without a genetic map and reference panel? Should I split my vcf files into separate files for each chromosome before phasing with Beagle? I have access to HPC so memory should not be an issue.
There's really no point at all trying to phase with just 9 samples and no reference panel. It just isn't going to give any kind of meaningful results, especially if they are very heterozygous. It's probably not what you want to hear, but it will really affect your downstream results.
I appreciate the feedback. I have heard the same thing from several others now too.
Update: The assembly I'm working with (Spur_5.0) contains 871 scaffolds. However, the 21 largest scaffolds make up 90% of the bases in the genome and correspond to the 21 autosomes. I was able to phase the largest 21 scaffolds individually by specifying chrom=[chrom] in my Beagle command. Maybe there is too much missing data in some of the smaller scaffolds to phase them?