Dear all,
all new to samtools, I would like to learn more about the mpileup command. As a test, I have used simulated reads on the first 1 K of the E. coli genome as described here: [http://biobits.org/samtools_primer.html][1]. In short:
~/bioinf$ wgsim -N1000 -S1 -r0 genomes/E_coli_1K.fa simulated_reads/sim_reads.fq /dev/null
[wgsim] seed = 1
[wgsim_core] calculating the total length of the reference sequence...
[wgsim_core] 1 sequences, total length: 1000
~/bioinf$ bowtie2 -x genomes/b2_indices/E_coli_1K -U simulated_reads/sim_reads.fq -S alignments/E_coli_1K.sam
1000 reads; of these:
1000 (100.00%) were unpaired; of these:
30 (3.00%) aligned 0 times
970 (97.00%) aligned exactly 1 time
0 (0.00%) aligned >1 times
97.00% overall alignment rate
(By the way, why do 30 reads not align?)
samtools view -b -S -o alignments/E_coli_1K.bam alignments/E_coli_1K.sam
I have checked that the .sam and .bam file contain 1000 alignments each as they should.
~/bioinf$ samtools view -c alignments/E_coli_1K.sam
1000
~/bioinf$ samtools view -c alignments/E_coli_1K.bam
1000
When I invoke samtools mpileup
~/bioinf$ samtools mpileup -f genomes/E_coli_1K.fa alignments/E_coli_1K_test.bam
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
1:1000 40 G 1 ^K. 2
1:1000 41 T 1 . 2
1:1000 42 G 1 . 2
1:1000 43 G 1 . 2
1:1000 44 A 1 . 2
1:1000 45 T 1 . 2
1:1000 46 T 1 . 2
1:1000 47 A 1 . 2
1:1000 48 A 1 . 2
1:1000 49 A 1 . 2
1:1000 50 A 1 . 2
1:1000 51 A 1 C 2
1:1000 52 A 1 . 2
1:1000 53 A 1 . 2
1:1000 54 G 1 . 2
1:1000 55 A 1 . 2
1:1000 56 G 1 . 2
1:1000 57 T 1 . 2
1:1000 58 G 1 . 2
1:1000 59 T 1 . 2
1:1000 60 C 1 . 2
1:1000 61 T 1 . 2
1:1000 62 G 1 . 2
1:1000 63 A 1 . 2
1:1000 64 T 1 . 2
1:1000 65 A 1 . 2
1:1000 66 G 1 . 2
1:1000 67 C 1 . 2
1:1000 68 A 1 . 2
1:1000 69 G 1 . 2
1:1000 70 C 1 . 2
1:1000 71 T 1 . 2
1:1000 72 T 1 . 2
1:1000 73 C 1 . 2
1:1000 74 T 1 . 2
1:1000 75 G 1 . 2
1:1000 76 A 1 . 2
1:1000 77 A 1 . 2
1:1000 78 C 1 . 2
1:1000 79 T 1 . 2
1:1000 80 G 1 . 2
1:1000 81 G 1 . 2
1:1000 82 T 1 . 2
1:1000 83 T 1 . 2
1:1000 84 A 1 . 2
1:1000 85 C 1 . 2
1:1000 86 C 1 . 2
1:1000 87 T 1 . 2
1:1000 88 G 1 . 2
1:1000 89 C 1 . 2
1:1000 90 C 2 .^K. 22
1:1000 91 G 2 .. 22
1:1000 92 T 2 .. 22
1:1000 93 G 2 .. 22
1:1000 94 A 2 .. 22
1:1000 95 G 2 .. 22
1:1000 96 T 2 .. 22
1:1000 97 A 2 .. 22
1:1000 98 A 2 .. 22
1:1000 99 A 2 .. 22
1:1000 100 T 2 .. 22
1:1000 101 T 2 .. 22
1:1000 102 A 2 .. 22
1:1000 103 A 2 .. 22
1:1000 104 A 2 .. 22
1:1000 105 A 2 .. 22
1:1000 106 T 2 .. 22
1:1000 107 T 2 .. 22
1:1000 108 T 2 .. 22
1:1000 109 T 2 .$. 22
1:1000 110 A 1 . 2
1:1000 111 T 1 . 2
1:1000 112 T 1 . 2
1:1000 113 G 1 . 2
1:1000 114 A 1 . 2
1:1000 115 C 1 . 2
1:1000 116 T 1 . 2
1:1000 117 T 1 . 2
1:1000 118 A 1 . 2
1:1000 119 G 1 . 2
1:1000 120 G 1 . 2
1:1000 121 T 1 . 2
1:1000 122 C 1 . 2
1:1000 123 A 1 . 2
1:1000 124 C 1 . 2
1:1000 125 T 1 . 2
1:1000 126 A 1 . 2
1:1000 127 A 1 . 2
1:1000 128 A 1 . 2
1:1000 129 T 1 . 2
1:1000 130 A 1 . 2
1:1000 131 C 1 . 2
1:1000 132 T 1 . 2
1:1000 133 T 1 . 2
1:1000 134 T 1 . 2
1:1000 135 A 1 . 2
1:1000 136 A 1 . 2
1:1000 137 C 1 . 2
1:1000 138 C 1 . 2
1:1000 139 A 1 . 2
1:1000 140 A 1 . 2
1:1000 141 T 1 . 2
1:1000 142 A 1 . 2
1:1000 143 T 1 . 2
1:1000 144 A 1 . 2
1:1000 145 G 1 . 2
1:1000 146 G 1 . 2
1:1000 147 C 1 . 2
1:1000 148 A 1 . 2
1:1000 149 T 1 . 2
1:1000 150 A 1 . 2
1:1000 151 G 1 . 2
1:1000 152 C 1 . 2
1:1000 153 G 1 . 2
1:1000 154 C 1 . 2
1:1000 155 A 1 . 2
1:1000 156 C 1 . 2
1:1000 157 A 1 . 2
1:1000 158 G 1 . 2
1:1000 159 A 1 .$ 2
[bam_pileup_core] the input is not sorted (reads out of order)
Only positions 40-159 seem to be covered by reads and each one only once or twice at most. So what happened with the rest of my reads (some of which align to positions beyond 159 as you can check in the sam/bam file). Also, with 1000 reads in total, the coverage at the above positions should be a lot higher (and less uniform) than reported. I have tried to play around with -d and -B, all to no avail. What am I doing wrong?
Your help is much appreciated,
mce
Try sorting your BAM file.
Edit: OP figured it out on his/her own.