I've aligned ChIP-seq data that is publically available (SRP004897), with Bowtie2 -k 3 command. I need information on the number of aligned reads per chromosome for downstream processing, for which I've used samtools idxstats. The output of idxstats shows 0 aligned and 0 unaligned reads, see below:
head idxstats.txt
"chr" "chr_length" "mapped_reads" "unmapped_reads"
chr1 248956422 0 0
chr10 133797422 0 0
chr10_GL383545v1_alt 179254 0 0
chr10_GL383546v1_alt 309802 0 0
chr10_KI270824v1_alt 181496 0 0
chr10_KI270825v1_alt 188315 0 0
chr11 135086622 0 0
chr11_GL383547v1_alt 154407 0 0
In fact i should have around 10 million aligned reads, according to bowtie2 and also confirmed with samtools flagstats. Any idea what is the problem? I've tried with two different versions of samtools (v1.2 and v1.7).
make sure you have the correct index and not a leftover from another run
you can double check the numbers by counting with
This is a way to count all mapped reads to
chr1
The index was made yesterday and I have also tried re-doing the conversion sam to bam, sorting, and indexing. With samtools view command, i get the number of mapped reads so everything is correct, but still idxstats doesn't want to give me any reads.
Why not use the Bowtie alignment stats that are automatically output to terminal?
There may still be an issue with indexing, as per Istvan, because Bowtie uses different genome FASTA indices to SAMtools. Bowtie also encodes different flags into the SAM format, which are evidently incompatible with SAMtools in this case.
Typically one would use
samtools idxstats
on a bwa-aligned BAM. Bowtie, however, gives its own alignment stats, as to which I have implied above.Kevin
Bowtie output doesn't give the number of mapped reads per chromosome. I'm actually trying to replicate the data from an article and they claim they have used Bowtie2 for the alignment and after that, they have a script on GitHub which automates the downstream processing. But the script doesn't work due to this problem with samtools idxstats. If what you are saying is true about Bowtie using different FASTA indices, then the pipeline they claim to have used in the article is not correct.
There is a previous thread here: samtools idxstats on indexed BAM failed
However, you appear to have already tried the eventual solution, ie., sorting and indexing the BAM with samtools
The samtools genome indices are created with
samtools faidx
, by the way, but I'm not sure if it's truly relevant in this case. You may want to just tr it.By the way, you may want to follow this thread, in addition: Calculating amount of reads on certain chromosomes in bam file