Question

Best Practices for CRAM <-> BAM

1

Entering edit mode

23 months ago

DavidStreid ▴ 90

Hi,

I am looking for advice about transitioning from bam/bai to cram for archival purposes. General advice is appreciated, but I'm specifically looking for answers to these two questions -

Does samtools offer the best performance for converting to and from CRAMs?
Do people have to re-index their BAM after converting from CRAM?

# 1) Convert BAM to CRAM
samtools view -T ${FA_REF} -C -o ${cram} ${bam}

# 2) Store CRAM & discard *.bam/*.bai

# 3) Retrieve CRAM from storage and convert to BAM
samtools view -T ${FA_REF} -b -o ${bam} ${cram} 

# 4) Re-index BAM to include CRAM-added headers (M5/UR)
sambamba index ${bam}

The re-indexing of the BAM rather than storing and re-using the original BAM's index seems like something I should be able to avoid somehow, but htsjdk is unable to read the new BAM with the old index. I've looked into samtools calmd to add the M5 & UR headers to the BAM before converting to CRAM, but it seems much faster to just re-index with sambamba.

Just as a note, the reasons I'm focused on just getting functionally-equivalent BAMs instead of using CRAMs or trying to get identical BAMs are -

previous posts indicate md5sum-identical BAMs aren't possible
this GATK post advises that using CRAMs directly in pipelines can cause slowdown

If people have experience with either of these assumptions being incorrect, please let me know. Thanks in advance!

sambamba bam cram samtools • 1.1k views

ADD COMMENT • link updated 23 months ago by GenoMax 150k • written 23 months ago by DavidStreid ▴ 90