Neither pysam nor samtools reading HTTPS-sourced BAM file correctly
0
0
Entering edit mode
15 months ago

I have a BAM file and its index on a web host:

$ ll HUDEP.control.DS182418.chr22.bam*
-rw-r--r-- 1 areynolds stamlab 14046407041 Jul 31 21:57 HUDEP.control.DS182418.chr22.bam
-rw-r--r-- 1 areynolds stamlab     1340656 Jul 31 21:57 HUDEP.control.DS182418.chr22.bam.bai

As a positive control, I can query this file via the HTTPS protocol from a web application using bam-js.

When I query this on the command-line or via a Python script running pysam, the pysam query quits with a malloc error:

[E::idx_test_and_fetch] Error reading "https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai"
python(9882,0x7ff855ff3700) malloc: double free for ptr 0x7fbe00148000
python(9882,0x7ff855ff3700) malloc: *** set a breakpoint in malloc_error_break to debug

This is the Python code I am using to perform the query:

import pysam

seqname = 'chr22'
start = 30265855
end = 30267901
bam_url = 'https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam'
with pysam.AlignmentFile(bam_url, 'rb') as bam_fh:
    iter = bam_fh.fetch(seqname, start, end)
    for x in iter:
        print(str(x))

The BAI file that pysam downloads locally is not the same size as what is on the web host (1048576 vs 1340656 bytes):

% ll *.bai
-rw-r--r--    1 areynolds  staff    1048576 Sep  1 12:13 HUDEP.control.DS182418.chr22.bam.bai

I'm running Python 3.8.13 with pysam v0.16.0.1 on macOS Catalina (v13.5.1).

I have tried a similar query with samtools v1.16.1 and get similar errors:

% samtools view https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai chr22:30265855-30267901
[E::hts_hopen] Failed to open file https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai
[E::hts_open_format] Failed to open file "https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai" : Inappropriate file type or format
samtools view: failed to open "https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai" for reading: Inappropriate file type or format

Before I start trying to migrate to another host, is there something obvious I'm doing wrong?

Is there a special way to compile samtools or install pysam required to support HTTPS-based queries?

I have recompiled htslib and samtools and still get the same errors:

% samtools --version
samtools 1.18-7-gae19296
Using htslib 1.18-17-g5acbc15-dirty

...

HTSlib URL scheme handlers present:
    built-in:    preload, data, file
    S3 Multipart Upload:     s3w, s3w+https, s3w+http
    Amazon S3:   s3+https, s3+http, s3
    Google Cloud Storage:    gs+http, gs+https, gs
    libcurl:     imaps, pop3, gophers, http, smb, gopher, ftps, imap, smtp, smtps, rtsp, ftp, telnet, mqtt, ldap, https, ldaps, smbs, tftp, pop3s, dict
    ...

Then:

% samtools view https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai chr22:30265855-30267901
[E::hts_hopen] Failed to open file https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai
[E::hts_open_format] Failed to open file "https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai" : Inappropriate file type or format
samtools view: failed to open "https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai" for reading: Inappropriate file type or format

I also tried an S3-based query, which also failed. Do samtools and derivatives work on macOS Catalina?

bam pysam samtools • 1.4k views
ADD COMMENT
1
Entering edit mode

in both examples you seem to be opening a .bai file instead of a .bam file

opening the same link but as .bam file seems to work

ADD REPLY
0
Entering edit mode

Sorry, this is a typo. If I run the pysam example with the correct link, I get an error about the BAI:

[E::idx_test_and_fetch] Error reading "https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam.bai"
python(20854,0x7ff855ff3700) malloc: double free for ptr 0x7f96e0098000
python(20854,0x7ff855ff3700) malloc: *** set a breakpoint in malloc_error_break to debug

The BAI file that pysam brings over to my working directory is truncated, as compared with the original.

I'll probably just file an issue. There may be another issue open that is similar.

ADD REPLY
1
Entering edit mode

I feel that there are multiple different and overlapping issues here. The error with the inappropriate file format is due to the typo, but it kind of overcomplicates the process of identifying the problem.

If I run

samtools view -c https://resources.altius.org/~areynolds/public/fiberview/072823/HUDEP.control.DS182418.chr22.bam chr22:30265855-30267901 

it produces the output

50

so with I would say the samtools access https resources. It would be very unexpected and obviously a common issue if it couldn't handle https.

ADD REPLY
1
Entering edit mode

PS: also if I run your python code with the corrected bam file it runs fine with no error and produces

$ ls -l HUDEP.control.DS182418.chr22.bam.bai
-rw-r--r-- 1 ialbert ialbert 1340656 Sep  3 08:55 HUDEP.control.DS182418.chr22.bam.bai

both examples tested on Linux and Mac (Ventura)

PS I wonder how big the .bai file would be if you were getting it via curl/wget

ADD REPLY
0
Entering edit mode

An aside: great English grammar ('neither'... 'nor')

ADD REPLY

Login before adding your answer.

Traffic: 2058 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6