Hi,
I am looking for advice about transitioning from bam
/bai
to cram
for archival purposes. General advice is appreciated, but I'm specifically looking for answers to these two questions -
- Does
samtools
offer the best performance for converting to and from CRAMs? - Do people have to re-index their BAM after converting from CRAM?
# 1) Convert BAM to CRAM
samtools view -T ${FA_REF} -C -o ${cram} ${bam}
# 2) Store CRAM & discard *.bam/*.bai
# 3) Retrieve CRAM from storage and convert to BAM
samtools view -T ${FA_REF} -b -o ${bam} ${cram}
# 4) Re-index BAM to include CRAM-added headers (M5/UR)
sambamba index ${bam}
The re-indexing of the BAM rather than storing and re-using the original BAM's index seems like something I should be able to avoid somehow, but htsjdk
is unable to read the new BAM with the old index. I've looked into samtools calmd
to add the M5
& UR
headers to the BAM before converting to CRAM, but it seems much faster to just re-index with sambamba
.
Just as a note, the reasons I'm focused on just getting functionally-equivalent BAMs instead of using CRAMs or trying to get identical BAMs are -
- previous posts indicate md5sum-identical BAMs aren't possible
- this GATK post advises that using CRAMs directly in pipelines can cause slowdown
If people have experience with either of these assumptions being incorrect, please let me know. Thanks in advance!