How do I find, given linear coordinates, the corresponding pangenome (e.g., .gfa format) locations? For example, I want to ask what region chr 1 1000-2000 corresponds to, and have vg tell me it corresponds to nodes 104-108 (or similar).
I have tried vg find, vg indexing tools, and read about vg chunk but I could not get this information. I would like to do this without having to run sequence alignment.
The command extracts path GRCh38#0#chr12 interval 10000000-10000100. Option -c 2 adds 2 nodes of context around the path and -x graph.gbz selects the graph. The output is in vg format, but you can convert it to GFA with vg convert -f.
With the graph I happened to have available, the output was:
H VN:Z:1.1 RS:Z:GRCh38 CHM13
S 13998664 G
S 13998665 TTTTTTCTCCCCTGCTATCTCCTTCACTCTGAATTTCTTGTCAATTCACTTAAAATTATCTTTCATTCTAAATTTCTTTCTCTATACTCTTCAAATTATTTTGAGAATGAGGTGAAGAATAAGCATAACACGTATA
S 13998659 AGAGTTGGTGACTTGCTGCATTGTTCTTTAGTAGTGTTTGTGCTTTAAAGTGAGTTTCCTGGAATGGAAGAATTTTGATTTGATGAAATCACTTCATTTCAAGTACAGCTGTGCTTAGATGAATATTTGAGACTGAATGGTTCCTAGCACCTTTTCATCACAATATAATATGTATTGCACACACTTAACTTTATGAAATGGGCTTTTACATTATTTTAGCTTGCACTGAAAGTAGTTGTAAGAACAGCCATACAAAGCTAGCTGATGCATTTATTATAGGAATATTGGGGAGGATGTTGTCTATTTTGCCATAG
S 13998661 A
S 13998657 C
S 13998663 A
S 13998658 A
S 13998662 GTAACTCTTACATAGTTTTATGGCAGATTTTTATTTTTGTTAAATCACCATGACTAGCAAAGCTTTAAGTAGTAGGCATTTACCTTATAAATAATTTGAATTCTATTTTTCTTTACTTAGCAGAGTGTTAGACTTTTCAATATAGTGATTATGTGGGCTGCAGTAGCAGATATTTTGTTTATTAACATGCCCTTTGGCATAGCCCGTAG
S 13998656 ATAGTTTATCTCATTGTGAGAAGCAAGTGTAATATAATTGGATCTGATAGGGGCTAAATTAGCTGTTTCATAGTCTTGTGTTGGAAGAGATCATTCCTGATTATGACCTAGAATTTACACAACTTCTTGCATATTTCATACTTTTTTTTATTTGAAGGATCCACATGAGGCAGGCCTGATATTGCTTAGATTTAGGTATTGAAGGAAAACTATCAAATCAAGTAAAATAATA
S 13998660 G
L 13998664 + 13998665 + 0M
L 13998659 + 13998660 + 0M
L 13998659 + 13998661 + 0M
L 13998661 + 13998662 + 0M
L 13998657 + 13998659 + 0M
L 13998663 + 13998665 + 0M
L 13998658 + 13998659 + 0M
L 13998662 + 13998663 + 0M
L 13998662 + 13998664 + 0M
L 13998656 + 13998657 + 0M
L 13998656 + 13998658 + 0M
L 13998660 + 13998662 + 0M
W CHM13 0 chr12 9885687 9886581 >13998656>13998658>13998659>13998661>13998662>13998663>13998665
W GRCh38 0 chr12 9999532 10000426 >13998656>13998657>13998659>13998660>13998662>13998663>13998665
I see, thank you. I believe the issue was that the PGGB files at /github.com/human-pangenomics/hpp_pangenome_resources do not have any GRCh38 paths at all.
I see, thank you. I believe the issue was that the PGGB files at /github.com/human-pangenomics/hpp_pangenome_resources do not have any GRCh38 paths at all.