Question about various of graph format. How to get the sequence and source of a specific node?
1
0
Entering edit mode
11 weeks ago
chuanj8848 • 0

.gfa is a text-based file that contains the structure of a pan-genome graph. I can write a script to parse this file, but it is time consuming due to its size.

However, there are several other formats used by VG. For example, .gbz, .vg, and .xg. These files are all binary, and I can't intuitively understand what information is contained in them or which information can be extracted from them.

I am wondering if there is any way to get the source and sequence for a specific node/segment. The source might indicate which haplotype contains this node.

vg • 720 views
ADD COMMENT
1
Entering edit mode

vg convert can convert those formats into GFA, and vg chunk can be used to query small graph regions. However, vg chunk loads the entire graph into memory for each query. This makes it fast enough for individual interactive queries, but too slow to be very effective as a backend to programmatic queries. There's development currently underway on a more responsive SQL-based query interface here.

ADD REPLY
0
Entering edit mode
5 weeks ago

Hi,

I had a similar requirement before, where I identified whether certain samples contained specific Nodes from a GFA file. This tool has also been uploaded to GitHub https://github.com/zhangyixing3/pantools

Fri Oct 11 17:50:01 stu_zhangyixing c050~
$ head nodes.20row 
1
2
3
4
5
6
7
8
9
10

Fri Oct 11 17:50:24 stu_zhangyixing c050~
$ gfar pav -g DRB1-3123.w.gfa -n nodes.20row -o testtt
2024/10/11 17:57 [DEBUG]pav.rs:12   GFA file parsed successfully
2024/10/11 17:57 [DEBUG]pav.rs:20   The number of nodes to be analyzed is: 20
2024/10/11 17:57 [DEBUG]pav.rs:29   total number of samples: 12
Done!, gfar version 0.1
CMD: gfar pav -g DRB1-3123.w.gfa -n nodes.20row -o testtt
Real time: 0 sec; CPU: 0 sec; Peak RSS: 0.004 GB


Fri Oct 11 17:57:07 stu_zhangyixing c050~
$ head testtt 
node    sample10    sample9 sample11    sample7 sample12    sample8 sample1 sample3 sample4 sample2 sample6 sample5
19  1   0   0   0   0   0   0   0   0   0   1   0
20  0   1   1   1   0   1   1   0   0   1   0   1
9   0   0   0   0   1   0   0   1   1   0   0   0
13  1   1   1   1   0   1   1   0   0   1   1   1
17  0   1   1   1   0   1   1   0   0   1   0   1
7   1   0   0   0   0   0   0   0   0   0   1   0
11  0   0   0   0   1   0   0   1   1   0   0   0
15  1   0   0   0   0   0   0   0   0   0   1   0
5   1   1   1   0   0   1   1   0   0   1   1   1
ADD COMMENT

Login before adding your answer.

Traffic: 2562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6