I know the bowtie manual states that it can have strand bias when asked to report a subset of all valid alignments. But the manual also states that "Running Bowtie in --best mode eliminates strand bias". Nonetheless, I recently observed strong strand bias (top strand) for the following call:
bowtie -f -v 1 --best --strata -k 50 -S -p 3 [ebwt_base] [smallRNAs.fasta]
In contrast, if I call bowtie with -a instead of -k 50 to get all alignments, and then use a script to truly randomly report up to 50 of them for each read, I see no more top strand bias .. so it doesn't appear inherent in my data:
bowtie -f -v 1 -a --best --strata -S -p 3 [ebwt_base] [smallRNAs.fasta] | [random50_script]
So, did I misunderstand the manual? Is one of the other settings (--strata ?) responsible?
NB : to understand the experiment, the reads in this case are ~20-24nt small RNA-seq data from plants, many of which come from repeats. So retaining multiple mappers is of interest to me. The -k 50 limit was intended to report some multiple mappings while keep the file sizes somewhat limited.
One thing that comes to mind is that bowtie may have second/third level ordering after the score and that the strand factors into that. Or that it first finds hits on one strand first and then on the other. The situation of limiting output is tricky as it would require a random shuffling of all hits before reporting but doing that all the time would be quite wasteful. This is a nice reminder to everyone to look out for these.
There was another discussion on the random number generator in Bowtie2 ATTENTION: bowtie2 and multiple hits