how to get path IDs from vg call with snarls
1
0
Entering edit mode
2.3 years ago

My pangenome graph was constructed using minigraph and then was used to call structure variants using vg giraffe. The comands were as follows: vg view -F pangenome.gfa --threads 100 > pangenome.vg vg mod -X 256 pangenome.vg -t 100 > pangenome.mod.vg vg autoindex --workflow giraffe -r ref.fa -p pangenome.mod -g pangenome.mod.vg -t 100 vg chunk -x pangenome.mod.giraffe.gbz -M -O pg -t 25 vg snarls chunk_chr1H.pg > chunk_chr1H.snarls

vg giraffe -Z pangenome.mod.giraffe.gbz -m pangenome.mod.min -d pangenome.mod.dist -f fq1.gz -f fq2.gz -t 40 > sample.gam vg gamsort sample.gam -t 15 -i sample.sorted.gam.gai > sample.sorted.gam vg chunk -x pangenome.mod.giraffe.gbz -g -a sample.sorted.gam -c 1 -t 15 vg pack -x chunk_chr1H.pg -g chunk_4_chr1H_0_516505931.gam -o chunk_chr1H.pack -t 15 vg call chunk_chr1H.pg -r chunk_chr1H.snarls -k chunk_chr1H.pack -s sample -a -t 15 > sample.chr1H.vcf

The first variant information was the follows: chr1H 9358 >293>306 CTGCTGTGATTCGTACTTTCCGGACACCCTAGGTTCCGGTTACGGGGAACTTGTCAAAACTCATAGTTTTGGCCTATTTCGGCTAGTTTTGTATGCTATTAGTCACTAATTTTGGGTCCCGTCGCCATCCGAACATTTTCGGAATCTCGGGGTCCGGTTAAGGGGAAATCATGAAAACTCGTAGTTTTGGCCTATCTCGGCCAGTTTTGTATGCTATTACTCACTGATTTTGGTTCCCACTGCGATCCAAACGTTTCGGGAACCCCGGGATCCGGTTACGGGGAACTCCTCAAAACTCACAGTTTTGGTCTATTTTGGCCAGTTTTGTATGCTATTACTCACTGATTTTGGGTCC C 278.732 lowdepth DP=5415;AT=>293>294>295>296>297>298>299>300>301>302>303>304>305>306,>293>306

I want to know the mean of number of the ID column and INFO column, such as 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306. Are these snarl numbers.

More important, how to know the corresonding path numbers in pangenome.mod.vg file associated with snarl numbers.

vg • 737 views
ADD COMMENT
0
Entering edit mode
2.1 years ago
anovak ▴ 130

The numbers are node ID numbers, and the > characters mean they are being visited in their local forward orientations; < would mean visiting the nodes in their reverse orientations.

The ID is I think the bounding nodes of the snarl (so this variant calls the snarl between node 293 forward and node 306 forward). The snarls don't have stable numbers exactly; they are defined by their bounding oriented nodes.

Then the AT tag is giving the path traversed by each allele. The first allele visits several nodes (293, 294, 295...) all in forward orientation, and the second allele just visits 293 and 306.

I'm not sure that pangenome.mod.vg has any numbered paths associated with snarls. You can follow https://github.com/vgteam/vg/wiki/Visualization#visualizing-subgraphs and https://github.com/vgteam/vg/wiki/Visualization#viewing-alignments if you want to look at how the reads align to the pangenome.mod.vg graph in the vicinity of the variant.

ADD COMMENT

Login before adding your answer.

Traffic: 1860 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6