Entering edit mode
17 months ago
Yihang
•
0
When I use vg convert
(v.1.39) to convert a gam
file to a gaf
file (or vice versa), I find that the order of the reads will not be preserved. For example, when I sort the reads according to the read ID in a gaf
file and then convert it to a gam
file, the reads are not ordered by the read ID anymore.
I am very curious about how vg convert works, and is there any way that can do the conversion and preserve the read orders?
Thanks!
Current vg is 1.48 I believe, why use such an old version ? There might also be no reason to preserve read order, and vg convert might be more efficient like this.
I tried vg 1.49 (current version), but still not work. The reason I want read-order-preserving property is that when we map some special kinds of pair reads to VG, having order preserve would significantly simplify the downstream analysis.
The reason that the read order is not preserved is that
vg convert
runs in parallel over the alignments, which introduces nondeterminism from the scheduler. I think you should be able to preserve read order by running in one thread (-t 1
), but it will run much slower.Thanks! Yeah I also thought about this point, so I did some extra experiments. Here are some observations.
When I use
vg convert -t 1 -F
on a.gaf
file which is already sorted by read ID (1,2,3,4...), I find that the output.gam
file still does not preserve the order (the output order is 33281, 33282, 33283,...). If I tried this command multiple times, the order of.gam
will be the same. (33281, 33282, 33283,...)When I use
vg convert -F
without-t 1
, the order is not preserved as I observed before. However, if I tried this command multiple times, the order of.gam
will be different.Therefore, I think multithreading does impact the order of the output
.gam
file. However, it cannot explain why when I use one thread, the order is not the same as the input.gaf
file. Do you know why this happens?If you just want to have the read ID ordered, you can use the unix sort command if your input is a
GAF
file. If you have agam
, it's more problematic.This toolset might be useful for GAF sorting - https://github.com/marschall-lab/gaftools