I'm trying to phase a short vcf file of a SINGLE individual. It has exome data from 6 genes in 4 chromosomes. In total the vcf has 30 variants. The source is the Torrent Variant Caller. In the Fixed section the chromosomes are included as "chrN". The VCF has been referenced with hg19.
./eagle \
--vcf ./C11.vcf \
--chrom 22 \
--geneticMapFile ./tables/genetic_map_hg19_withX.txt.gz \
--outPrefix ./tmp/prueba_phased
Even though the software has some issues to recognize the chro header, I believe it starts with no problem:
=== Reading genotype data ===
Reading genotypes for N = 1 samples
[W::vcf_parse] Contig 'chr7' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chr10' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chr15' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chr22' is not defined in the header. (Quick workaround: index the file with tabix.)
Read M = 11 variants
Filling in genetic map coordinates using reference file:
./tables/genetic_map_hg19_withX.txt.gz
Physical distance range: 3967 base pairs
Genetic distance range: 0.000207245 cM
Average # SNPs per cM: 53077
Auto-selecting --maxBlockLen: 0.25 cM
Number of <=(64-SNP, 0.25cM) segments: 1
Average # SNPs per segment: 11
Fraction of heterozygous genotypes: 0
SKIPPED STEP 1
SKIPPED STEP 2
BEGINNING STEP 3 (PBWT ITERS)
Then it starts to phase the sample trhogu 1 to 10 batches:
BATCH 1 OF 10
Phasing samples 1-0
Time for phasing batch: 3.09944e-06
BATCH 2 OF 10
Phasing samples 1-0
Time for phasing batch: 691414e-06
When it gets to the tenth batch it changes and I get the error:
BATCH 10 OF 10
Phasing samples 1-1
WARNING: Sample 1 (1-indexed) has a het count of 0
ERROR: Failed to allocate 18446744073709551596 bytes
I don't understand why I get this error if I just have 11 variants in these gene and why it says that sample 1 has a het count of 0. Below you can see the Fixed section for the Chr 22:
CHROM POS ID REF ALT QUAL FILTER
chr22 42522613 NA G C 100.0 PASS
chr22 42523409 NA G T 100.0 PASS
chr22 42523943 NA A G 100.0 PASS
chr22 42525952 NA C A 100.0 PASS
chr22 42526484 NA A C 100.0 PASS
chr22 42526549 NA C T 100.0 PASS
chr22 42526561 NA GG TC 100.0 PASS
chr22 42526567 NA G A 100.0 PASS
chr22 42526571 NA C G 100.0 PASS
chr22 42526573 NA T G 100.0 PASS
I hope you can help me to fix this error. Thanks.
Thank you for your input !Yes! Im aware of that. And, as you can see I added the reference panel:
--geneticMapFile ./tables/genetic_map_hg19_withX.txt.gz \
I also split by chromosome, I specifically asked for the chromosome 22 as you can see in the first script.The argument
--geneticMapFile
specifies a genetic map, which gives the recombination rates between each SNP. This is different to a reference panel of haplotypes used for the phasing. You gave given the first but not the second. You need to download something like the 1000 genomes and use it as a reference,I see, you are right. I need to include the
--vcfRef
argument. Thanks.I'm still having a problem with the reference genome... it seems that my variants are not present in the reference. But does not make any sense since I'm using a b37 reference genome in a h19 built target vcf.