Question

Issue With CRAM -> BAM -> FASTQ Conversion

0

Entering edit mode

5.4 years ago

denis.k ▴ 20

Please help! I am trying to obtain fastq files from the GDSC, all we have in the lab is CRAM files. Unfortunately, the reference genome seems to not exist when pulled from an online source. I have attempted to use the Samtools code which allows me to put in my own reference but I get an error message

code used:

 samtools view -b  -T /GRCh37.p13.genome.fa -o /EGAR00001252191_13305_1.bam /EGAR00001252191_13305_1.cram

error message:

[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/1b22b98cdeb4a9304cb5d48026a85128": Protocol not supported
[E::cram_get_ref] Failed to populate reference for id 0
[E::cram_decode_slice] Unable to fetch reference #0 10507..19965
[E::cram_next_slice] Failure to decode slice
[main_samview] truncated file.

Samtools v1.9, OS: Ubuntu 18.04.2 LTS

Any ideas? Thank you in advance!

samtools fastq cram bam view • 6.7k views

ADD COMMENT • link updated 19 months ago by Michael • 0 • written 5.4 years ago by denis.k ▴ 20

0

Entering edit mode

I assume this data is coming from European Genome Phenome Archive? You should email datasharing at sanger.ac.uk to get assistance with reference.

ADD REPLY • link 5.4 years ago by GenoMax 147k

0

Entering edit mode

Hello,

how have you installed samtools? Is htslib installed as well?

Related: https://github.com/samtools/samtools/issues/857

fin swimmer

ADD REPLY • link 5.4 years ago by finswimmer 16k

score 4 · Answer 1 · 2019-07-17

4

Entering edit mode

5.4 years ago

jkbonfield ★ 1.3k

Has your copy of samtools (htslib actually) been compiled with curl support? If you just did "make" the answer will probably be no. You need ./configure;make. It's dumb and an annoyance of mine that we have two different build routes that by default choose two different combinations of features. Argh. This is almost certainly what the "protocol not supported" means as without curl it won't be able to handle https; only http.

If you do indeed have a curl capable samtools, then is it maybe failing due to a firewall rule? You can use htslib's "hts_file -vvvvv -c in.cram > /dev/null" to read a CRAM file with maximum verbosity enabled, which may shed some light on the manner. (Sadly samtools itself doesn't have any way to enter htslib debug mode, unless you attach a debugger to it.)

ADD COMMENT • link 5.4 years ago by jkbonfield ★ 1.3k

1

Entering edit mode

Thank you! It was a lib curl problem! Turns out that I just needed the --enable lib curl option. Such a silly problem. Thank you for all the help

ADD REPLY • link 5.3 years ago by denis.k ▴ 20

0

Entering edit mode

Hi,

In our institute, the tools are all installed on a cluster by the systems admin, so I don't know whether or not ./configure was run. Could you please tell me at which step did you use --enable lib curl? I'm guessing you used it with samtools view -b -T /GRCh37.p13.genome.fa -o /EGAR00001252191_13305_1.bam /EGAR00001252191_13305_1.cram, but in samtools-1.9 (the version on our cluster), there is no --enable lib curl option under samtools view.

ADD REPLY • link 3.7 years ago by vinayjrao ▴ 250

score 2 · Answer 2 · 2021-03-25

2

Entering edit mode

3.7 years ago

bw. ▴ 260

For anyone else that runs into this issue, the following worked for me:

wget https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta
wget https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai 
wget https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.ref_cache.tar.gz

tar xzf Homo_sapiens_assembly38.ref_cache.tar.gz
export REF_PATH="$(pwd)/ref/cache/%2s/%2s/%s:http://www.ebi.ac.uk/ena/cram/md5/%s"
export REF_CACHE="$(pwd)/ref/cache/%2s/%2s/%s"

...

{command that was generating the '[W::find_file_url] Failed to open reference..' error}

ADD COMMENT • link 3.7 years ago by bw. ▴ 260

0

Entering edit mode

is there somemthing similar for hg19?

ADD REPLY • link 3.4 years ago by User000 ▴ 710

1

Entering edit mode

Broad seems to have removed all hg19 files from their google public bucket. Other people have asked about these and we have told them to ask Broad tech support. No one has come back and posted a response as far as I recollect. If you are willing, please do that. It would be useful for all.