Is there a way to quickly calculate the nodes' average coverage of variation graph from the read alignment file (.gam, .json) ?
0
0
Entering edit mode
6 months ago
Wenhai • 0

Is there a way to quickly calculate the nodes' average coverage of variation graph from the read alignment file (.gam, .json) ?

Thank you in advance!

vg • 1.1k views
ADD COMMENT
0
Entering edit mode

You can convert the .gam into a .pack file using vg pack and then query a table of the node coverages using vg pack -d.

ADD REPLY
0
Entering edit mode

What is the meaning of fields in the table? I found that there are multiple records for the same node_id. I don't quite understand this table. The first line also seems a bit strange.

seq.pos node.id node.offset coverage
0   78860   0   0
1   227651  0   6
2   227651  1   6
3   227651  2   6
4   227651  3   6
5   227651  4   6
6   227651  5   6
7   227651  6   5
8   227651  7   6
9   227651  8   6
10  227651  9   7
11  227651  10  7
12  227651  11  7
13  227651  12  7
14  227651  13  7
15  227651  14  7
16  227651  15  7
17  227651  16  7
18  227651  17  7
19  227651  18  7
20  227651  19  7
21  227651  20  7
22  227651  21  7
23  227651  22  7
24  227651  23  7
25  227651  24  7
26  227651  25  7
27  227651  26  7
28  227651  27  7
29  227651  28  7
30  227651  29  7
31  227651  30  7
32  227651  31  7
33  227651  32  7
34  227651  33  7
35  227651  34  7
36  227651  35  7
37  227651  36  7
38  227651  37  7
39  227651  38  7
40  227651  39  7
41  227651  40  7
42  227651  41  7
43  227651  42  7
44  227651  43  7
45  227651  44  7
46  227651  45  7
47  227651  46  7
48  227651  47  6
49  227651  48  6
50  227651  49  6
51  227651  50  5
52  227651  51  5
53  227651  52  5
54  227651  53  5
55  227651  54  4
56  227651  55  4
57  227651  56  4
58  227651  57  4
59  227651  58  4
60  227651  59  4
61  227651  60  4
62  227651  61  4
ADD REPLY
1
Entering edit mode

Each node consists of a number of bases, so the full position of a base in the pangenome graph is given by the combination of 1) the node ID, and 2) the offset of the base on the node (numbered 0,...,len(node) - 1), which are given by the node.id and node.offset columns. The seq.pos is an alternate assignment of identifiers that are given numbers starting at 0. However, the seq.pos won't tell you which positions are on the same node, if that's what you're interested in.

ADD REPLY
0
Entering edit mode

My understanding is that each record represents the coverage of the base. For example, 1 227651 0 6 indicates that the coverage of the first base of node 227651 is 6. So the sum of all base coverage of this node divided by the length of this node can obtain the average coverage depth of that node. Is my understanding right?

ADD REPLY
0
Entering edit mode

Yes, that's it.

ADD REPLY
0
Entering edit mode

I got it. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6