Hi !
I have many .bam that I want to get their .bai using samtools in the terminal.
I tried the following command :
samtools index *.bam
However, I did not get any .bai file.
Regards
Hi !
I have many .bam that I want to get their .bai using samtools in the terminal.
I tried the following command :
samtools index *.bam
However, I did not get any .bai file.
Regards
samtools index supports the syntax
samtools index -M [-bc] [-m INT] <in1.bam> <in2.bam>...
using GNU parallel:
parallel samtools index ::: *.bam
Samtools index only accepts a single input file, so using a shell metacharacter to specify multiple files will not work. I usually use a shell wrapper to run samtools index on a single file at a time.
#!/usr/bin/env bash
# index_all.sh
for INFILE in "$@"
do
samtools index $INFILE
done
Then it is simple to run:
./index_all.sh /directory/*.bam
You need to point the results to a file to create this:
So for one file it would be
samtools index file.bam file.bam.bai
See this link for a great description.
In your case with many bam files I would do it in a shell script as follows:
#!/bin/bash
for I in *.bam
do
echo "Indexing: "$i
samtools index $i $i".bai"
done
@Devon Ryan: Yes indeed, with my samtools version I need to write samtools index foo.bam
So if I understood well , You are saying that the *.bam
won't work with my samtools version?
*.bam
won't work with any samtools version. I should note that understanding why this is the case will be helpful for you in general (this will be the case for many tools), so I'll explain that.
Assume you're in a directory with three BAM files: A.bam
, B.bam
and C.bam
. When you type samtools index *.bam
, your shell sees *.bam
and expands it. Consequently, what samtools sees you as running is samtools index A.bam B.bam C.bam
. That'd be fine if samtools index could accept more than one input file at a time, but it can't. Further, it may either then see you as using the alternate syntax that Jonathan Crowther mentioned or simply die due to not knowing what to do (I'd have to check the source code, though I expect the former would happen.
So, what you want to do is simply iterate over the files with a for loop:
for f in *.bam do samtools index $f done
bash one-liner loop:
# Considering that you are inside the folder where the bams are:
for bam in $(ls *.bam); do samtools index $bam; done
Starting in samtools 1.16, you can use
samtools index -M *.bam
Convert and sort sam to bam
samtools view -bS file.sam | samtools sort -o file.bam
Create an index for the sorted bam
samtools index file.bam file.bai
Count mapped reads
samtools view -bS file.sam | cut -f1 | sort | uniq | wc -l
Hi Waldeyr Mendes Cordeiro da Silva,
thanks for contributing. Please be sure though that you do not repost answers.
This answer has already been given 5 times in this thread including the accepted one. Also note that current samtools
versions do not require a view
command prior to sort
. sort
will happily read SAM and output BAM, please check the manual for details (-O BAM
option and automatic format detection based on output file suffix with -o
).
Also the read counting is not correct. First the view
will output a BAM file so a binary format that is not compatible with cut
etc. commands. Also even if it was a SAM file it would count the header (if you print it via samtools view -h
) but in any case it counts all reads (= also unmapped ones) so the result is not reliable. Use samtools flagstat
instead which is specialized code for exactly what you want to do. This works both on SAM/BAM/CRAM format.
use find and xargs
find . -name "*.bam" | xargs -n1 samtools index
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
thanks this is useful. Do you know what is the optimal way of setting -j ? Without setting it does it just default to max cores? Or should I set it to something total core minus 1. That is if I have 8 then I will just set it to 7? thanks.
From the manual:
So you shouldn't normally have to worry about it unless, for example, you want to constrain the number of jobs to be less than your number of CPU cores, or you want two jobs per CPU core.