Entering edit mode
17 months ago
Oleksii
•
0
Is it possible to use HTSlib
thread pool when loading large (genomic) FASTA sequences?
Here's a sample code:
#include <htslib/hts.h>
#include <htslib/sam.h>
#include <htslib/faidx.h>
#include <htslib/bgzf.h>
#include <htslib/thread_pool.h>
char fn[] = "/path/to/bgzipped.fasta.fa.gz";
int nthreads = 4;
faidx_t *faidx = fai_load(fn);
hts_tpool *tpool = hts_tpool_init(nthreads);
bgzf_thread_pool(faidx->bgzf, tpool, 0);
int fetched = 0;
for (int i=0; i<faidx_nseq(faidx); i++) {
const char *name = faidx_iseq(faidx, i);
int64_t length = faidx_seq_len(faidx, name);
char *sequence = faidx_fetch_seq(faidx, name, 0, length-1, &fetched);
// Do something
free(sequence);
}
fai_destroy(faidx);
hts_tpool_destroy(tpool);
It won't compile because faidx_t
is opaque and compiler doesn't know that this structure has bgzf
field.
Does anyone know how to do it correctly?
Update on 13.07.2023:
It is now possible using
fai_thread_pool(faidx, tpool, 0);
My first thought is how much of the CPU time is in Deflate and how much is in parsing the data? The thread pool will aid decompression only. A
perf record
on fetching chr2 from a bgzf compressed human ref showed:So if you could parallelise the decompression step away to nothing it'd still be under 30% quicker. I haven't checked out your change in anger though. Maybe I'm misremembering this and there is parallel parsing as well. (I think not, but... maybe)
In answer to your question though, not that I'm aware of. There isn't a trivial way of controlling the bgzf component of the faidx struct. Although I note that bgzf is the first component, so you could (ghastly!) do
bgzf_thread_pool(*(bgzf **)faidx, tpool, 0);
.Obviously that totally breaks the opaque type and isn't not my recommended solution! However it should permit you to compile and test the performance change. If it's significant, then please raise an issue on the htslib github site so we can look into a formal API, in a similar vein to the
fai_set_cache_size
function.Thanks very much! (and sorry if I caused an anger)
Only computer at hand is ARM64 Mac (fast SSD, fast memory) and complete ingestion and processing of hg38 (889MB, bgzipped) takes as follows:
Reading only takes 14.48s for no threads or 8.67s for a single additional thread.
For me, 30% of speedup is great anyway. I can check 5yo Linux PC tomorrow.
Update from i7-7700, 64GB, Ubuntu 20.04. Reading and processing:
Do you think it may qualify for an API change?
uh ? how is related to your problem ? why cannot it compile ?
Compiler throws an error at
faidx->bgzf
, because internal structure offaidx
is not exported. Question is how to bind thread pool to fasta filehandle otherwise.ah ok, I see now , thanks