vg deconstruct: how can I generate a VCF (in hg38 coords) of differences between hg38 and CHM13?
2
0
Entering edit mode
20 months ago
weisburd • 0

I downloaded https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph/hprc-v1.0-minigraph-grch38.gfa.gz which contains hg38, chm13, and other assemblies, and now am trying to use vg to generate a VCF with the variants in CHM13 relative to hg38.

After converting to vg format, by running vg convert <(gunzip -c hprc-v1.0-minigraph-grch38.gfa.gz) > hprc-v1.0-minigraph-grch38.vg, I tried a few different variations of vg deoncstruct, but they all crash (https://github.com/vgteam/vg/issues/3960) except for

vg deconstruct --path chr1 hprc-v1.0-minigraph-grch38.vg --verbose -t 2 -e

which outputs an empty vcf:

$ vg deconstruct --path chr1 hprc-v1.0-minigraph-grch38.vg --verbose -t 2 -e
Computed overlay in 3.26128 seconds using 3.33672 CPU seconds.
Finding snarls
Deconstructing top-level snarls
##fileformat=VCFv4.2
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=CONFLICT,Number=.,Type=String,Description="Sample names for which there are multiple paths in the graph with conflicting alleles">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1]">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=AT,Number=R,Type=String,Description="Allele Traversal as path in graph">
##contig=<ID=chr1,length=248956422>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  CHM13   HG00438 HG00621 HG00673 HG00733 HG00735 HG00741 HG01071 HG01106 HG01109 HG01123 HG01175 HG01243 HG01258 HG01358 HG01361 HG01891 HG01928 HG01952 HG01978 HG02055 HG02080 HG02109 HG02145 HG02148 HG02257 HG02486 HG02559 HG02572 HG02622 HG02630 HG02717 HG02723 HG02818 HG02886 HG03098 HG03453 HG03486 HG03492 HG03516 HG03540 HG03579 NA18906 NA20129 NA21309 chr10   chr11   chr11_KI270721v1_random chr12   chr13   chr14   chr14_GL000009v2_random chr14_GL000194v1_random chr14_GL000225v1_random chr14_KI270722v1_random chr14_KI270723v1_random chr14_KI270724v1_random chr14_KI270725v1_random chr14_KI270726v1_random chr15   chr15_KI270727v1_random chr16   chr16_KI270728v1_random chr17   chr17_GL000205v2_random chr17_KI270729v1_random chr17_KI270730v1_random chr18   chr19   chr1_KI270706v1_random  chr1_KI270707v1_random  chr1_KI270708v1_random  chr1_KI270709v1_random  chr1_KI270710v1_random  chr1_KI270711v1_random  chr1_KI270712v1_random  chr1_KI270713v1_random  chr1_KI270714v1_random  chr2    chr20   chr21   chr22   chr22_KI270731v1_random chr22_KI270732v1_random chr22_KI270733v1_random chr22_KI270734v1_random chr22_KI270735v1_random chr22_KI270736v1_random chr22_KI270737v1_random chr22_KI270738v1_random chr22_KI270739v1_random chr2_KI270715v1_random  chr2_KI270716v1_random  chr3    chr3_GL000221v1_random  chr4    chr4_GL000008v2_random  chr5    chr5_GL000208v1_random  chr6    chr7    chr8    chr9    chr9_KI270717v1_random  chr9_KI270718v1_random  chr9_KI270719v1_random  chr9_KI270720v1_random  chrM    chrUn_GL000195v1    chrUn_GL000213v1    chrUn_GL000214v1    chrUn_GL000216v2    chrUn_GL000218v1    chrUn_GL000219v1    chrUn_GL000220v1    chrUn_GL000224v1    chrUn_GL000226v1    chrUn_KI270302v1    chrUn_KI270303v1    chrUn_KI270304v1    chrUn_KI270305v1    chrUn_KI270310v1    chrUn_KI270311v1    chrUn_KI270312v1    chrUn_KI270315v1    chrUn_KI270316v1    chrUn_KI270317v1    chrUn_KI270320v1    chrUn_KI270322v1    chrUn_KI270329v1    chrUn_KI270330v1    chrUn_KI270333v1    chrUn_KI270334v1    chrUn_KI270335v1    chrUn_KI270336v1    chrUn_KI270337v1    chrUn_KI270338v1    chrUn_KI270340v1    chrUn_KI270362v1    chrUn_KI270363v1    chrUn_KI270364v1    chrUn_KI270366v1    chrUn_KI270371v1    chrUn_KI270372v1    chrUn_KI270373v1    chrUn_KI270374v1    chrUn_KI270375v1    chrUn_KI270376v1    chrUn_KI270378v1    chrUn_KI270379v1    chrUn_KI270381v1    chrUn_KI270382v1    chrUn_KI270383v1    chrUn_KI270384v1    chrUn_KI270385v1    chrUn_KI270386v1    chrUn_KI270387v1    chrUn_KI270388v1    chrUn_KI270389v1    chrUn_KI270390v1    chrUn_KI270391v1    chrUn_KI270392v1    chrUn_KI270393v1    chrUn_KI270394v1    chrUn_KI270395v1    chrUn_KI270396v1    chrUn_KI270411v1    chrUn_KI270412v1    chrUn_KI270414v1    chrUn_KI270417v1    chrUn_KI270418v1    chrUn_KI270419v1    chrUn_KI270420v1    chrUn_KI270422v1    chrUn_KI270423v1    chrUn_KI270424v1    chrUn_KI270425v1    chrUn_KI270429v1    chrUn_KI270435v1    chrUn_KI270438v1    chrUn_KI270442v1    chrUn_KI270448v1    chrUn_KI270465v1    chrUn_KI270466v1    chrUn_KI270467v1    chrUn_KI270468v1    chrUn_KI270507v1    chrUn_KI270508v1    chrUn_KI270509v1    chrUn_KI270510v1    chrUn_KI270511v1    chrUn_KI270512v1    chrUn_KI270515v1    chrUn_KI270516v1    chrUn_KI270517v1    chrUn_KI270518v1    chrUn_KI270519v1    chrUn_KI270521v1    chrUn_KI270522v1    chrUn_KI270528v1    chrUn_KI270529v1    chrUn_KI270530v1    chrUn_KI270538v1    chrUn_KI270539v1    chrUn_KI270544v1    chrUn_KI270548v1    chrUn_KI270579v1    chrUn_KI270580v1    chrUn_KI270581v1    chrUn_KI270582v1    chrUn_KI270583v1    chrUn_KI270584v1    chrUn_KI270587v1    chrUn_KI270588v1    chrUn_KI270589v1    chrUn_KI270590v1    chrUn_KI270591v1    chrUn_KI270593v1    chrUn_KI270741v1    chrUn_KI270742v1    chrUn_KI270743v1    chrUn_KI270744v1    chrUn_KI270745v1    chrUn_KI270746v1    chrUn_KI270747v1    chrUn_KI270748v1    chrUn_KI270749v1    chrUn_KI270750v1    chrUn_KI270751v1    chrUn_KI270752v1    chrUn_KI270753v1    chrUn_KI270754v1    chrUn_KI270755v1    chrUn_KI270756v1    chrUn_KI270757v1    chrX    chrY    chrY_KI270740v1_random

before exiting.

What's the correct way to do this?

variant-graph vg • 1.6k views
ADD COMMENT
0
Entering edit mode
20 months ago
weisburd • 0

Found a vcf had already been generated for that pangenome graph. See discussion in the vg issue linked in the opening post.

ADD COMMENT
0
Entering edit mode
20 months ago
Sasha ▴ 850

Maybe try the following steps below:

  1. Create a sorted and indexed graph:
    vg index -x hprc-v1.0-minigraph-grch38.xg -g hprc-v1.0-minigraph-grch38.gcsa -k 16 hprc-v1.0-minigraph-grch38.vg
    
  2. Use vg deconstruct to generate a VCF file with the variants in CHM13 relative to hg38:
    vg deconstruct -p chr1 -a CHM13 -r hg38 hprc-v1.0-minigraph-grch38.xg > CHM13_hg38_variants.vcf
    

Replace chr1 with the chromosome you are interested in. If you want to generate VCF files for all chromosomes, you can run the above command in a loop for each chromosome.

I'm using my chatbot here (https://tinybio.cloud) to help generate this answer. You can download it from the website.

Good luck with your research!

ADD COMMENT

Login before adding your answer.

Traffic: 3385 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6