Dear all,
I'm currently writing a program that need random access to large files. After looking up a bit I found BGZIP, largely used on SAMTOOLS.
I tried to implement it into my program but I'm getting an error: "Error: invalid block header"
Also, if I set start_pos = 0 it works.
I've tried to decompress it with bgzip (compiled from samtools 1.18) and it works fine! here is the code I'm using:
BGZF* in_glf_fh;
unsigned int total_bytes_read = 0;
// Define chunk start and end positions
unsigned int start_pos = 2203 * 10000;
unsigned int end_pos = start_pos + 10000 - 1;
unsigned int chunk_size = end_pos - start_pos + 1;
// Open input file
in_glf_fh = bgzf_open(pars->in_glf, "rb");
if( in_glf_fh == NULL )
error("ERROR: cannot open GLF file!");
// Search start position
if( bgzf_seek(in_glf_fh, start_pos * pars->n_ind * 3 * sizeof(double), SEEK_SET) < 0 )
error("ERROR: cannot seek GLF file!");
// Read data from file
for(unsigned int c = 0; c < chunk_size; c++) {
int bytes_read = bgzf_read(in_glf_fh, chunk_data[c], sizeof(double) * pars->n_ind * 3);
if( (unsigned int) bytes_read != sizeof(double) * pars->n_ind * 3 )
fprintf(stderr, "Error: %s\n", in_glf_fh->error);
total_bytes_read += bytes_read;
}
bgzf_close(in_glf_fh);
thanks in adv,
FGV
what are
start_pos
,end_pos
,chunk_size
? how do you know if your offset inbgzf_read
is not "out of bounds"?chunk_size
is the amount of data I want to read (10000 in this case)start_pos
andend_pos
is the interval I want to read from the BGZIP file...cross-posted on the samtools-dev mailing list: http://sourceforge.net/mailarchive/message.php?msg_id=29974208