Question

Using vg gamsort with naive sorting algorithm

1

Entering edit mode

2.2 years ago

AshleeThomson ▴ 130

Hello,

I am currently using VG to map reads to a graph I made (using a HPC), and I'm having issues when sorting the ouput gam:

vg gamsort \
    -p \
    ${filename}_mapped.gam > ${filename}_mapped.sorted.gam

break into sorted chunks       [                       ]  0.0%^M break into sorted chunks      
[=                      ]  1.1%^M break into sorted chunks      
[=                      ]  2.2%^M break into sorted chunks      
[=                      ]  3.3%^M break into sorted chunks     
[==                     ]  4.4%^M[E::bgzf_flush] File write failed (wrong size)
    terminate called after throwing an instance of 'std::runtime_error'
      what():  io::MessageEmitter::emit_group: I/O error writing protobuf

I have run 20 samples previously and had no issues, but this lastest batch of samples keep erroring out at this point. The GAMs are no bigger then the first batch so I'm unsure as to what has changed. I finally got a sample that errored out with this message:

break into sorted chunks      
[                       ]  0.0%^M break into sorted chunks      
[=                      ]  0.4%^M break into sorted chunks      
[=                      ]  0.8%^M break into sorted chunks      
[=                      ]  1.2%^M[vg utility.cpp]: couldn't create temp directory: /tmp/vg-mYzrsD

I have tried cleaning out my tmp folder and changing my environment TMPDIR to a local TEMP directory but it hasn't helped.

My question is, has anyone had any experience with using the naive sorting algorithm option when using vg gamsort.

-d / --dumb-sort        use naive sorting algorithm (no tmp files, faster for small GAMs)

I was hoping that since it doesn't produce tmp files it may solve my problem, but wasn't sure if it would affect the actual sorting.

Alternatively, is there another option to forst GAM files.

Thanks in advance.

gamsort vg • 1.3k views

ADD COMMENT • link updated 18 months ago by Pierre Lindenbaum 166k • written 2.2 years ago by AshleeThomson ▴ 130

1

Entering edit mode

Don't forget to follow up on your threads. If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.

Upvote|Bookmark|Accept

ADD REPLY • link 18 months ago by Pierre Lindenbaum 166k

score 1 · Answer 1 · 2024-04-23

The -d sorting mode does the entire sort in memory, so it will only work if you have enough memory to load your whole GAM file.

To find a temp directory, vg consults, in order, TMPDIR, TMP, TEMP, TEMPDIR, and USERPROFILE, and uses the first one set. So setting TMPDIR (as long as you export it from the shell to the actual environment) should tell it where to keep its partially-sorted files.

How does the free space reported in the temporary directory you set (df -h $TMPDIR) compare to the size of the GAM file you are trying to sort (du -hs ${filename}_mapped.gam)? It sounds like your "local TEMP directory" still might not have enough space.