Hello
I have a transcriptomic data from a illumina single end, strand specific experiment, it is a meta transcriptomic sample. Actually I am analizing the transcripts of bacteria.
I am using bowtie2 to map the sequences to a reference genome, but they are strand-specific, and Bowtie2 does not have that option. That is why I want to use hisat2 in order to select the strand specific option.
As hisat2 considers splicing and bacterias do not have splicing. Do you recommend to use the option --no-spliced-alignment in order to do the aligment?
Why do you do not recommend the tuxedo pipeline, I am new in this topic, What others programs can use?
Thank you
Tuxedo pipeline is one of the older examples of RNAseq data analysis programs and is deprecated to some extent. Developers of tuxedo now recommend that you use HISAT2 which is the new program they wrote.
Since the original question was about bacterial RNAseq one does not need to worry about splicing over long distance so pretty much any NGS aligner could be used for alignment. BBMap, bwa, bowtie2 (and many others) are examples of this type of programs.
There's an interesting caveat I just discovered. Apparently, bwa and bowtie2 do not add NH tag to the resulting SAM/BAM files (that's the flag indicating how many times did this particular read ID was reported mapped - so, simply put, it's important to keep track of the multimapping reads).
So, any type of downstream processing that "rescues" multimappers grinds to a halt here - you're pretty much stuck with only uniquely aligned reads. While that's mostly OK for many genes (bacterial genomes are not very repetitive most of the time), you will completely miss out on perfectly duplicated genes, as well as on many short RNAs. So far I've seen as much as 20% multimapping reads in bacterial RNA-seq, although 2-5% seems to be more common with longer reads.