I have a BAM file and its index on a web host:
$ ll HUDEP.control.DS182418.chr22.bam*
-rw-r--r-- 1 areynolds stamlab 14046407041 Jul 31 21:57 HUDEP.control.DS182418.chr22.bam
-rw-r--r-- 1 areynolds stamlab 1340656 Jul 31 21:57 HUDEP.control.DS182418.chr22.bam.bai
As a positive control, I can query this file via the HTTPS protocol from a web application using bam-js
.
When I query this on the command-line or via a Python script running pysam
, the pysam
query quits with a malloc
error:
[E::idx_test_and_fetch] Error reading "https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai"
python(9882,0x7ff855ff3700) malloc: double free for ptr 0x7fbe00148000
python(9882,0x7ff855ff3700) malloc: *** set a breakpoint in malloc_error_break to debug
This is the Python code I am using to perform the query:
import pysam
seqname = 'chr22'
start = 30265855
end = 30267901
bam_url = 'https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam'
with pysam.AlignmentFile(bam_url, 'rb') as bam_fh:
iter = bam_fh.fetch(seqname, start, end)
for x in iter:
print(str(x))
The BAI file that pysam
downloads locally is not the same size as what is on the web host (1048576 vs 1340656 bytes):
% ll *.bai
-rw-r--r-- 1 areynolds staff 1048576 Sep 1 12:13 HUDEP.control.DS182418.chr22.bam.bai
I'm running Python 3.8.13 with pysam
v0.16.0.1 on macOS Catalina (v13.5.1).
I have tried a similar query with samtools
v1.16.1 and get similar errors:
% samtools view https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai chr22:30265855-30267901
[E::hts_hopen] Failed to open file https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai
[E::hts_open_format] Failed to open file "https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai" : Inappropriate file type or format
samtools view: failed to open "https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai" for reading: Inappropriate file type or format
Before I start trying to migrate to another host, is there something obvious I'm doing wrong?
Is there a special way to compile samtools
or install pysam
required to support HTTPS-based queries?
I have recompiled htslib
and samtools
and still get the same errors:
% samtools --version
samtools 1.18-7-gae19296
Using htslib 1.18-17-g5acbc15-dirty
...
HTSlib URL scheme handlers present:
built-in: preload, data, file
S3 Multipart Upload: s3w, s3w+https, s3w+http
Amazon S3: s3+https, s3+http, s3
Google Cloud Storage: gs+http, gs+https, gs
libcurl: imaps, pop3, gophers, http, smb, gopher, ftps, imap, smtp, smtps, rtsp, ftp, telnet, mqtt, ldap, https, ldaps, smbs, tftp, pop3s, dict
...
Then:
% samtools view https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai chr22:30265855-30267901
[E::hts_hopen] Failed to open file https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai
[E::hts_open_format] Failed to open file "https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai" : Inappropriate file type or format
samtools view: failed to open "https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai" for reading: Inappropriate file type or format
I also tried an S3-based query, which also failed. Do samtools
and derivatives work on macOS Catalina?
in both examples you seem to be opening a
.bai
file instead of a.bam
fileopening the same link but as
.bam
file seems to workSorry, this is a typo. If I run the
pysam
example with the correct link, I get an error about the BAI:The BAI file that
pysam
brings over to my working directory is truncated, as compared with the original.I'll probably just file an issue. There may be another issue open that is similar.
I feel that there are multiple different and overlapping issues here. The error with the inappropriate file format is due to the typo, but it kind of overcomplicates the process of identifying the problem.
If I run
it produces the output
so with I would say the samtools access https resources. It would be very unexpected and obviously a common issue if it couldn't handle https.
PS: also if I run your python code with the corrected bam file it runs fine with no error and produces
both examples tested on Linux and Mac (Ventura)
PS I wonder how big the .bai file would be if you were getting it via curl/wget
An aside: great English grammar ('
neither
'... 'nor
')