Your question refers to SAM format, but in the event that you may also have a sorted/indexed BAM, you can simply do the following to get this very quickly.
samtools idxstats aln.bam | cut -f 1,3
The solution based on samtools idxstats aln.bam
should be used with caution. This is because AFAIK the numbers reported by samtools idxstats (& flagstat) represent the number of alignments of reads that are mapped to chromosomes, not the (non-redundant) number of reads, as stated in the documentation. I stumbled across this by observing that the total numbers reported by samtools idxstats (& flagstat) exceeded the total number of reads in my input fastq files. See further discussion on this issue in the comments on this post: http://left.subtree.org/2012/04/13/counting-the-number-of-reads-in-a-bam-file/
based on the first column only:
awk ' $1 !~ /@/ {print $1}' toy.sam | uniq -c
if you want to count only aligned reads you can also use awk to count only aligned reads.
Here's how I count the number per sequence of primary alignments with proper read pairs:
#!/usr/bin/env bash | |
# reads_per_seq | |
# | |
# Run 8 jobs in parallel on many BAM files: | |
# | |
# parallel -j8 reads_per_seq ::: *.bam | |
set -eou pipefail | |
if [[ $# -ne 1 || "$1" != *.bam ]]; then | |
echo "usage: $(basename $0) file.bam" | |
echo | |
echo " Produce a tab-delimited file.seq with number per sequence of" | |
echo " primary alignments with proper read pairs." | |
echo | |
exit 1 | |
fi | |
bam="$1" | |
# Read more about samtools options: http://www.htslib.org/doc/samtools.html | |
# | |
# Learn more about samtools flags: https://www.samformat.info/sam-format-flag | |
# | |
samtools view -f 2 -F 104 "$bam" \ | |
| cut -f3 | sort | uniq -c | sort -k1nr \ | |
| awk 'OFS="\t" {print $1,$2}' \ | |
> "${bam%.bam}.seq" |
Now there is a more convenient way with seqkit bam:
seqkit bam -C mybam.bam
This is the Samtools Solution + bash
Enjoy
PROCESSORS=10
BAM=blah.bam | blah.sam
samtools view --threads $PROCESSORS $BAM | cut -f 3 | sort | uniq -c | sort -nr | sed -e 's/^ *//;s/ /\t/' | awk 'OFS="\t" {print $2,$1}' | sort -n -k1,1 > Chr_freqs.txt
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Incidentally, I wrote a program that calculates read coverage from a sam file (number of reads mapped to each sequence, average depth, median depth, standard deviation, fraction of bases covered, etc).
It has a toggle for including or ignoring secondary alignments;
secondary=false
orsecondary=true
(default).When generating your SAM file make sure it has the header section before using pileup.sh.
samtools view -h file.bam > file.sam
Example pipe with filtering:
Brian, I'd be interested to read the source code of your program, but your link is broken. Could I ask you to provide a new link?
Sometimes Sourceforge goes down for a few hours for maintenance, or something. I put the latest version (35.14) on my Google drive; please let me know if you still have trouble accessing it.
https://drive.google.com/file/d/0B3llHR93L14wTzZSMjZvdjdmYzg/view?usp=sharing
The source code for pileup.sh is mainly in
/bbmap/current/jgi/CoveragePileup.java