How do I figure out pangenome location of hg38 coordinates?
1
1
Entering edit mode
1 day ago
a5864557 ▴ 10

How do I find, given linear coordinates, the corresponding pangenome (e.g., .gfa format) locations? For example, I want to ask what region chr 1 1000-2000 corresponds to, and have vg tell me it corresponds to nodes 104-108 (or similar).

I have tried vg find, vg indexing tools, and read about vg chunk but I could not get this information. I would like to do this without having to run sequence alignment.

Perhaps it is possible to use a .paf file (https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/pggb/untangle/hprc-v1.0-pggb.all.vs.grch38.untangle-m10000-s0-j0.paf.gz) and .gfa file (https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/pggb/hprc-v1.0-pggb.gfa.gz) to accomplish this?

(Note: files from this link https://github.com/human-pangenomics/hpp_pangenome_resources.)

Otherwise, how is it possible to locate a (for example) gene without doing sequence alignment?

I am wondering how it is possible to do this in vg.

pangenome hg38 vg • 203 views
ADD COMMENT
2
Entering edit mode
1 day ago
Jouni Sirén ▴ 580

You can use vg find option -p / --path for that. For example:

vg find -p "GRCh38#0#chr12:10000000-10000100" -c 2 -x graph.gbz | vg convert -f - > output.gfa

The command extracts path GRCh38#0#chr12 interval 10000000-10000100. Option -c 2 adds 2 nodes of context around the path and -x graph.gbz selects the graph. The output is in vg format, but you can convert it to GFA with vg convert -f.

With the graph I happened to have available, the output was:

H       VN:Z:1.1        RS:Z:GRCh38 CHM13
S       13998664        G
S       13998665        TTTTTTCTCCCCTGCTATCTCCTTCACTCTGAATTTCTTGTCAATTCACTTAAAATTATCTTTCATTCTAAATTTCTTTCTCTATACTCTTCAAATTATTTTGAGAATGAGGTGAAGAATAAGCATAACACGTATA
S       13998659        AGAGTTGGTGACTTGCTGCATTGTTCTTTAGTAGTGTTTGTGCTTTAAAGTGAGTTTCCTGGAATGGAAGAATTTTGATTTGATGAAATCACTTCATTTCAAGTACAGCTGTGCTTAGATGAATATTTGAGACTGAATGGTTCCTAGCACCTTTTCATCACAATATAATATGTATTGCACACACTTAACTTTATGAAATGGGCTTTTACATTATTTTAGCTTGCACTGAAAGTAGTTGTAAGAACAGCCATACAAAGCTAGCTGATGCATTTATTATAGGAATATTGGGGAGGATGTTGTCTATTTTGCCATAG
S       13998661        A
S       13998657        C
S       13998663        A
S       13998658        A
S       13998662        GTAACTCTTACATAGTTTTATGGCAGATTTTTATTTTTGTTAAATCACCATGACTAGCAAAGCTTTAAGTAGTAGGCATTTACCTTATAAATAATTTGAATTCTATTTTTCTTTACTTAGCAGAGTGTTAGACTTTTCAATATAGTGATTATGTGGGCTGCAGTAGCAGATATTTTGTTTATTAACATGCCCTTTGGCATAGCCCGTAG
S       13998656        ATAGTTTATCTCATTGTGAGAAGCAAGTGTAATATAATTGGATCTGATAGGGGCTAAATTAGCTGTTTCATAGTCTTGTGTTGGAAGAGATCATTCCTGATTATGACCTAGAATTTACACAACTTCTTGCATATTTCATACTTTTTTTTATTTGAAGGATCCACATGAGGCAGGCCTGATATTGCTTAGATTTAGGTATTGAAGGAAAACTATCAAATCAAGTAAAATAATA
S       13998660        G
L       13998664        +       13998665        +       0M
L       13998659        +       13998660        +       0M
L       13998659        +       13998661        +       0M
L       13998661        +       13998662        +       0M
L       13998657        +       13998659        +       0M
L       13998663        +       13998665        +       0M
L       13998658        +       13998659        +       0M
L       13998662        +       13998663        +       0M
L       13998662        +       13998664        +       0M
L       13998656        +       13998657        +       0M
L       13998656        +       13998658        +       0M
L       13998660        +       13998662        +       0M
W       CHM13   0       chr12   9885687 9886581 >13998656>13998658>13998659>13998661>13998662>13998663>13998665
W       GRCh38  0       chr12   9999532 10000426        >13998656>13998657>13998659>13998660>13998662>13998663>13998665
ADD COMMENT
0
Entering edit mode

I see, thank you. I believe the issue was that the PGGB files at /github.com/human-pangenomics/hpp_pangenome_resources do not have any GRCh38 paths at all.

ADD REPLY

Login before adding your answer.

Traffic: 2452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6