I am using the Expansionhunterdenovo package (https://github.com/Illumina/ExpansionHunterDenovo) to analyse large genome files. The command:
ExpansionHunterDenovo profile --reads in.bam --reference GRCh38.fa --output-prefix out
Works fine when processing bam files.
However half of my samples are in CRAM format. This is a problem because it seems to be at odds with the way that HTSlib parses data- which relates to the explanation at the bottom of this page (see "The REF_PATH and REF_CACHE" section): https://www.htslib.org/workflow/cram.html
Indeed the following command:
ExpansionHunterDenovo profile --reads in.cram --reference GRCh38.fa --output-prefix out
results in the error:
Failure to decode slice
This is a problem for me because executable binaries such as ExpansionHunterDenovo wrap precompiled versions of HTSlib (version 1.9 in this case) and hence there is no way to make necessary changes to the code once this once the package has been compiled. I am wondering if there is a way to change the compiling instructions in e.g.: https://github.com/Illumina/ExpansionHunterDenovo/blob/master/source/Makefile ?
Or indeed to change the code before installing the package in order to enable the package to parse CRAM files? I also tried installing installing the package with conda with a later version of htslib to see if that worked but it still gives the same error.
I wanted to try asking this here before submitting an issue to the maintainers' github page as there are many executables that I use that wrap htslib and hence I hoped I could fix this internally?
convert to BAM using samtools prior to run ExpansionHunterDenovo ?
I could do this but these CRAM files are really large and I have 1000s of them to process so that would be exhaustive