Your question refers to SAM format, but in the event that you may also have a sorted/indexed BAM, you can simply do the following to get this very quickly.
samtools idxstats aln.bam | cut -f 1,3
The solution based on samtools idxstats aln.bam
should be used with caution. This is because AFAIK the numbers reported by samtools idxstats (& flagstat) represent the number of alignments of reads that are mapped to chromosomes, not the (non-redundant) number of reads, as stated in the documentation. I stumbled across this by observing that the total numbers reported by samtools idxstats (& flagstat) exceeded the total number of reads in my input fastq files. See further discussion on this issue in the comments on this post: http://left.subtree.org/2012/04/13/counting-the-number-of-reads-in-a-bam-file/
based on the first column only:
awk ' $1 !~ /@/ {print $1}' toy.sam | uniq -c
if you want to count only aligned reads you can also use awk to count only aligned reads.
Here's how I count the number per sequence of primary alignments with proper read pairs:
Now there is a more convenient way with seqkit bam:
seqkit bam -C mybam.bam
This is the Samtools Solution + bash
Enjoy
PROCESSORS=10
BAM=blah.bam | blah.sam
samtools view --threads $PROCESSORS $BAM | cut -f 3 | sort | uniq -c | sort -nr | sed -e 's/^ *//;s/ /\t/' | awk 'OFS="\t" {print $2,$1}' | sort -n -k1,1 > Chr_freqs.txt
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Incidentally, I wrote a program that calculates read coverage from a sam file (number of reads mapped to each sequence, average depth, median depth, standard deviation, fraction of bases covered, etc).
It has a toggle for including or ignoring secondary alignments;
secondary=false
orsecondary=true
(default).When generating your SAM file make sure it has the header section before using pileup.sh.
samtools view -h file.bam > file.sam
Example pipe with filtering:
Brian, I'd be interested to read the source code of your program, but your link is broken. Could I ask you to provide a new link?
Sometimes Sourceforge goes down for a few hours for maintenance, or something. I put the latest version (35.14) on my Google drive; please let me know if you still have trouble accessing it.
https://drive.google.com/file/d/0B3llHR93L14wTzZSMjZvdjdmYzg/view?usp=sharing
The source code for pileup.sh is mainly in
/bbmap/current/jgi/CoveragePileup.java