Question

HTSlib threads when ingesting large bgzipped FASTA

0

Entering edit mode

17 months ago

Oleksii • 0

Is it possible to use HTSlib thread pool when loading large (genomic) FASTA sequences?

Here's a sample code:

#include <htslib/hts.h>
#include <htslib/sam.h>
#include <htslib/faidx.h>
#include <htslib/bgzf.h>
#include <htslib/thread_pool.h>

char fn[] = "/path/to/bgzipped.fasta.fa.gz";
int nthreads = 4;

faidx_t *faidx = fai_load(fn);

hts_tpool *tpool = hts_tpool_init(nthreads);
bgzf_thread_pool(faidx->bgzf, tpool, 0);

int fetched = 0;

for (int i=0; i<faidx_nseq(faidx); i++) {
  const char *name = faidx_iseq(faidx, i);
  int64_t length = faidx_seq_len(faidx, name);

  char *sequence = faidx_fetch_seq(faidx, name, 0, length-1, &fetched);

  // Do something

  free(sequence);
}

fai_destroy(faidx);
hts_tpool_destroy(tpool);

It won't compile because faidx_t is opaque and compiler doesn't know that this structure has bgzf field.

Does anyone know how to do it correctly?

Update on 13.07.2023:

It is now possible using

fai_thread_pool(faidx, tpool, 0);

htslib samtools bgzip c • 962 views

ADD COMMENT • link 16 months ago by Oleksii • 0

1

Entering edit mode

My first thought is how much of the CPU time is in Deflate and how much is in parsing the data? The thread pool will aid decompression only. A perf record on fetching chr2 from a bgzf compressed human ref showed:

  36.22%  samtools  samtools             [.] fai_retrieve.isra.4
  25.93%  samtools  libdeflate.so.0      [.] deflate_decompress_default
  15.09%  samtools  samtools             [.] bgzf_getc
   4.38%  samtools  [kernel]             [k] 0xffffffff87a00a47
   3.88%  samtools  libc-2.27.so         [.] __ctype_b_loc
   3.75%  samtools  libdeflate.so.0      [.] crc32_x86_pclmul_avx
   3.19%  samtools  samtools             [.] __ctype_b_loc@plt
   2.26%  samtools  [kernel]             [k] 0xffffffff87a00190
   1.73%  samtools  libc-2.27.so         [.] fwrite
   0.94%  samtools  libc-2.27.so         [.] __memmove_sse2_unaligned_erms

So if you could parallelise the decompression step away to nothing it'd still be under 30% quicker. I haven't checked out your change in anger though. Maybe I'm misremembering this and there is parallel parsing as well. (I think not, but... maybe)

In answer to your question though, not that I'm aware of. There isn't a trivial way of controlling the bgzf component of the faidx struct. Although I note that bgzf is the first component, so you could (ghastly!) do bgzf_thread_pool(*(bgzf **)faidx, tpool, 0);.

Obviously that totally breaks the opaque type and isn't not my recommended solution! However it should permit you to compile and test the performance change. If it's significant, then please raise an issue on the htslib github site so we can look into a formal API, in a similar vein to the fai_set_cache_size function.

ADD REPLY • link 17 months ago by jkbonfield ★ 1.3k

0

Entering edit mode

Thanks very much! (and sorry if I caused an anger)

Only computer at hand is ARM64 Mac (fast SSD, fast memory) and complete ingestion and processing of hg38 (889MB, bgzipped) takes as follows:

no additional threads - 21.14s
1 additional thread - 15.39s
2 additional threads - 15.38s

Reading only takes 14.48s for no threads or 8.67s for a single additional thread.

For me, 30% of speedup is great anyway. I can check 5yo Linux PC tomorrow.

Update from i7-7700, 64GB, Ubuntu 20.04. Reading and processing:

no additional threads - 36.62s (reading only - 24.56s)
1 additional thread - 28.57s (reading only - 15.80s)
2 additional threads - 28.42s

Do you think it may qualify for an API change?

ADD REPLY • link 17 months ago by Oleksii • 0

0

Entering edit mode

It won't compile because faidx_t is opaque and compiler doesn't know that this structure has bgzf field

uh ? how is related to your problem ? why cannot it compile ?

ADD REPLY • link 17 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Compiler throws an error at faidx->bgzf, because internal structure of faidx is not exported. Question is how to bind thread pool to fasta filehandle otherwise.