Hi there I'm facing the task of merging the CRAM files for 25 human samples.
Each on is divided into 12-13 CRAM files (total of 322 individual CRAMs), for which I have set a sample identifier and number as follow code_number where the code refers to the samples identifier and the number to the CRAM partitioning.
Now, I'm aware samtools can do so; however, I have limited experience with CRAM files let alone having to merge a large number of them. So, my question is what is the exact command I should use e.g
Do I need the .crai index for such operation, and what is the best format to output to — say BAM over a merged CRAM? Still, I will then need to use the reference to get back to FASTQ.
Thanks for the feedback. However, for some reason running the --input-fmt-option CRAM causes the following error
[E::hts_opt_add] Unknown option 'CRAM'
Usage: samtools merge [-nurlf] [-h inh.sam] [-b <bamlist.fofn>] <out.bam> <in1.bam> [<in2.bam> ... <inN.bam>]
Options:
-n Input files are sorted by read name
-t TAG Input files are sorted by TAG value
-r Attach RG tag (inferred from file names)
-u Uncompressed BAM output
-f Overwrite the output BAM if exist
-1 Compress level 1
-l INT Compression level, from 0 to 9 [-1]
-R STR Merge file in the specified region STR [all]
-h FILE Copy the header in FILE to <out.bam> [in1.bam]
-c Combine @RG headers with colliding IDs [alter IDs to be distinct]
-p Combine @PG headers with colliding IDs [alter IDs to be distinct]
-s VALUE Override random seed
-b FILE List of input BAM filenames, one per line [null]
-X Use customized index files
-L FILE Specify a BED file for multiple region filtering [null]
--no-PG do not add a PG line
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
-@, --threads INT
Number of additional threads to use [0]
--write-index
Automatically index the output files [off]
--verbosity INT
Set level of verbosity
the format CRAM/BAM/SAM is automatically detected you don't need --input-fmt-option which anyways, doesn't work like --input-fmt-option CRAM . It's a key=value syntax http://www.htslib.org/doc/samtools.html
--input-fmt-option is for options, not format. It was added as a way of specifying the reference sequence for commands reading CRAMs that didn't have a way to specify reference, eg with --input-fmt-option reference=ref.fa.
You don't need to specify the input file type as htslib will auto-detect it.
It'll also detect the output file type based on filename, but if outputting to stdout or a non-standard name, you can use --output-format cram, -O cram for short. You can also add format options here. Eg -O cram,embed_ref,use_bzip2.
Hi @Pierre Lindenbaum,
Thanks for the feedback. However, for some reason running the
--input-fmt-option CRAM
causes the following errorAny idea why?
the format CRAM/BAM/SAM is automatically detected you don't need --input-fmt-option which anyways, doesn't work like
--input-fmt-option CRAM
. It's a key=value syntax http://www.htslib.org/doc/samtools.html