Using Hisat2 with strand specific bacteria sequences
2
0
Entering edit mode
8.3 years ago
vm.higareda ▴ 30

Hello

I have a transcriptomic data from a illumina single end, strand specific experiment, it is a meta transcriptomic sample. Actually I am analizing the transcripts of bacteria.

I am using bowtie2 to map the sequences to a reference genome, but they are strand-specific, and Bowtie2 does not have that option. That is why I want to use hisat2 in order to select the strand specific option.

As hisat2 considers splicing and bacterias do not have splicing. Do you recommend to use the option --no-spliced-alignment in order to do the aligment?

alignment RNA-Seq • 5.2k views
ADD COMMENT
1
Entering edit mode
8.3 years ago

I would say that you should not need to worry about getting reads mapped as spliced if there is no actual splicing taking place in the data.

So there is no need to pass that option.

ADD COMMENT
1
Entering edit mode
8.3 years ago

Bacterial RNA-seq data does have a little splicing, it's just rare and typically short, due to self-splicing transcripts. I think it probably does not matter a lot for the most part, but better safe than sorry... so, I'd suggest allowing spliced alignments, but restricting the maximum splice length to something fairly short, like 50bp; I'm not sure what the actual range is of self-splicing, but I seem to recall it's generally around 20bp.

Whether the library is strand-specific is important to analysis, not mapping. Unless you are using the Tuxedo pipeline for analysis (which I do not recommend), there's no particular reason to let strand-specificity impact your choice of aligner.

ADD COMMENT
0
Entering edit mode

Why do you do not recommend the tuxedo pipeline, I am new in this topic, What others programs can use?

Thank you

ADD REPLY
1
Entering edit mode

Tuxedo pipeline is one of the older examples of RNAseq data analysis programs and is deprecated to some extent. Developers of tuxedo now recommend that you use HISAT2 which is the new program they wrote.

Since the original question was about bacterial RNAseq one does not need to worry about splicing over long distance so pretty much any NGS aligner could be used for alignment. BBMap, bwa, bowtie2 (and many others) are examples of this type of programs.

ADD REPLY
0
Entering edit mode

There's an interesting caveat I just discovered. Apparently, bwa and bowtie2 do not add NH tag to the resulting SAM/BAM files (that's the flag indicating how many times did this particular read ID was reported mapped - so, simply put, it's important to keep track of the multimapping reads).

So, any type of downstream processing that "rescues" multimappers grinds to a halt here - you're pretty much stuck with only uniquely aligned reads. While that's mostly OK for many genes (bacterial genomes are not very repetitive most of the time), you will completely miss out on perfectly duplicated genes, as well as on many short RNAs. So far I've seen as much as 20% multimapping reads in bacterial RNA-seq, although 2-5% seems to be more common with longer reads.

ADD REPLY

Login before adding your answer.

Traffic: 1550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6