vg convert does not preserve the order of the reads
0
0
Entering edit mode
17 months ago
Yihang • 0

When I use vg convert (v.1.39) to convert a gam file to a gaf file (or vice versa), I find that the order of the reads will not be preserved. For example, when I sort the reads according to the read ID in a gaf file and then convert it to a gam file, the reads are not ordered by the read ID anymore.

I am very curious about how vg convert works, and is there any way that can do the conversion and preserve the read orders?

Thanks!

graph variant vg vgteam • 1.2k views
ADD COMMENT
0
Entering edit mode

Current vg is 1.48 I believe, why use such an old version ? There might also be no reason to preserve read order, and vg convert might be more efficient like this.

ADD REPLY
0
Entering edit mode

I tried vg 1.49 (current version), but still not work. The reason I want read-order-preserving property is that when we map some special kinds of pair reads to VG, having order preserve would significantly simplify the downstream analysis.

ADD REPLY
0
Entering edit mode

The reason that the read order is not preserved is that vg convert runs in parallel over the alignments, which introduces nondeterminism from the scheduler. I think you should be able to preserve read order by running in one thread (-t 1), but it will run much slower.

ADD REPLY
0
Entering edit mode

Thanks! Yeah I also thought about this point, so I did some extra experiments. Here are some observations.

When I use vg convert -t 1 -F on a .gaf file which is already sorted by read ID (1,2,3,4...), I find that the output .gam file still does not preserve the order (the output order is 33281, 33282, 33283,...). If I tried this command multiple times, the order of .gam will be the same. (33281, 33282, 33283,...)

When I use vg convert -F without -t 1, the order is not preserved as I observed before. However, if I tried this command multiple times, the order of .gam will be different.

Therefore, I think multithreading does impact the order of the output .gam file. However, it cannot explain why when I use one thread, the order is not the same as the input .gaf file. Do you know why this happens?

ADD REPLY
1
Entering edit mode

If you just want to have the read ID ordered, you can use the unix sort command if your input is a GAF file. If you have a gam, it's more problematic.

This toolset might be useful for GAF sorting - https://github.com/marschall-lab/gaftools

ADD REPLY

Login before adding your answer.

Traffic: 2475 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6