Question on samtools view with --fast option
1
1
Entering edit mode
13 months ago
tamu.anand ▴ 30

Hi

I had a question on samtools view with --fast option. I was trying to find any relevant docs and/or blogs detailing its usage and how best to use it. I could not find any and I thought I will ask the biostars community.

samtools view --help

Usage: samtools view [options] <in.bam>|<in.sam>|<in.cram> [region ...]

Output options:

  -1, --fast                 Use fast BAM compression (and default to --bam)
  • has anyone had experience using this
  • In which situations will this option be useful

Also, to give some context around the above. I am working on WGS data using Parabricks

  • working on g4dn.12xlarge - 4gpus, 48cpus, 192 GB RAM and 900GB NVME SSD - https://aws.amazon.com/ec2/instance-types/g4/
    • step 1 is use to use pb fq2bam --align-only to get sample_id.bam
    • step 2 is the below process of post-alt where I am trying to speed up using a combination of gnu parallel and samtools view

This process takes quite a bit of time and I would appreciate any tips on speedup - hence the question on samtools view --fast too

samtools view -H sample_id.bam > header

samtools view sample_id.bam | split -l 10000000 - split_bam_

ls split_bam_* \
        | parallel -j 8 "/usr/local/bin/k8 /usr/local/bin/bwa-postalt.js ref_fasta.alt {}" \
        | cat header - \
        | samtools view --threads 8 -o sample_id.postalt.bam

Thanks in advance.

parallel samtools fq2bam bam compression • 1.5k views
ADD COMMENT
2
Entering edit mode
13 months ago

In which situations will this option be useful

when you want to produce a temporary BAM file that is going to be read quite quickly and is going to be deleted (because the file will be large and you don't want to keep files with low compression level).

Another way is to use --uncompressed (no compression at all)

This process takes quite a bit of time and I would appreciate any tips on speedup

while parallel is a great tool, you should use a workflow manager (snakemale, nextflow), and split a BAM file instead of a SAM file. (eg: https://gatk.broadinstitute.org/hc/en-us/articles/360041849771-SplitSamByNumberOfReads-Picard- if it's sorted on read-name )

ADD COMMENT
0
Entering edit mode

Hi Pierre Lindenbaum - would you happen to have any examples of this with snakemake/nextflow? I am thinking of having the bam split by chrs, carry out my next process (k8 and post-alt.js) and then combine all bams for that sample by using samtools merge. Any other ideas? Thanks in advance.

ADD REPLY
0
Entering edit mode

If you can arrange the separate output files to be in a defined order then you can use samtools cat instead of samtools merge to perform the hadoop style split + parallel-task + join strategy.

ADD REPLY
0
Entering edit mode

Thanks jkbonfield for all your tips. I am trying to experiment with Pierre's idea of using https://gatk.broadinstitute.org/hc/en-us/articles/360037064232-SplitSamByNumberOfReads-Picard- with workflow managers

  • run each smaller bam and then use samtools merge
  • you have suggested to use samtools cat - just curious, why samtools cat over samtools merge - you also suggest to have it in a defined order - could you elaborate
  • will also try samtools cat

Thanks a lot

ADD REPLY

Login before adding your answer.

Traffic: 1907 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6