Hi
I had a question on samtools view
with --fast
option. I was trying to find any relevant docs and/or blogs detailing its usage and how best to use it. I could not find any and I thought I will ask the biostars community.
samtools view --help
Usage: samtools view [options] <in.bam>|<in.sam>|<in.cram> [region ...]
Output options:
-1, --fast Use fast BAM compression (and default to --bam)
- has anyone had experience using this
- In which situations will this option be useful
Also, to give some context around the above. I am working on WGS data using Parabricks
- working on g4dn.12xlarge - 4gpus, 48cpus, 192 GB RAM and 900GB NVME SSD - https://aws.amazon.com/ec2/instance-types/g4/
- step 1 is use to use
pb fq2bam --align-only
to get sample_id.bam - step 2 is the below process of post-alt where I am trying to speed up using a combination of
gnu parallel
and samtools view
- step 1 is use to use
This process takes quite a bit of time and I would appreciate any tips on speedup - hence the question on samtools view --fast
too
samtools view -H sample_id.bam > header
samtools view sample_id.bam | split -l 10000000 - split_bam_
ls split_bam_* \
| parallel -j 8 "/usr/local/bin/k8 /usr/local/bin/bwa-postalt.js ref_fasta.alt {}" \
| cat header - \
| samtools view --threads 8 -o sample_id.postalt.bam
Thanks in advance.
Hi Pierre Lindenbaum - would you happen to have any examples of this with snakemake/nextflow? I am thinking of having the bam split by chrs, carry out my next process (k8 and post-alt.js) and then combine all bams for that sample by using
samtools merge
. Any other ideas? Thanks in advance.If you can arrange the separate output files to be in a defined order then you can use
samtools cat
instead ofsamtools merge
to perform the hadoop style split + parallel-task + join strategy.Thanks jkbonfield for all your tips. I am trying to experiment with Pierre's idea of using https://gatk.broadinstitute.org/hc/en-us/articles/360037064232-SplitSamByNumberOfReads-Picard- with workflow managers
samtools merge
samtools cat
- just curious, whysamtools cat
oversamtools merge
- you also suggest to have it in a defined order - could you elaboratesamtools cat
Thanks a lot