I have a blocked gzip file where the data I want is between two byte indexes, which I determined using biopython's BgzfReader fh.tell() fuction. I can easily access this data using this code...
from Bio.bgzf import BgzfReader
start = 75191629497
stop = 75191634445
with BgzfReader(bgzip_path) as fh_reads:
fh_reads.seek(start)
for line in fh_reads:
if fh_reads.tell() > stop:
break
print line
The code above works perfectly and prints out the expected data.
My problem is that these offsets do not work for other htslib utilities. For example, the bgzip command line utility has a -b option for the start byte offset and a -s option for the size of the data you want to decompress. Using the above example the size would be 75191634445 - 75191629497 or 4948 bytes. So I tried the following:
bgzip -c -b 75191629497 -s 4948 /path/to/bgzip
This command doesn't work. I get a "Segmentation fault (core dumped)" error. My question is... Can the byte positions generated and used by biopython's BgzfReader be used with other htslib based applications? If so, how would I do this? Thanks.