Question

Using Hisat2 with strand specific bacteria sequences

0

Entering edit mode

8.4 years ago

vm.higareda ▴ 30

Hello

I have a transcriptomic data from a illumina single end, strand specific experiment, it is a meta transcriptomic sample. Actually I am analizing the transcripts of bacteria.

I am using bowtie2 to map the sequences to a reference genome, but they are strand-specific, and Bowtie2 does not have that option. That is why I want to use hisat2 in order to select the strand specific option.

As hisat2 considers splicing and bacterias do not have splicing. Do you recommend to use the option --no-spliced-alignment in order to do the aligment?

alignment RNA-Seq • 5.2k views

ADD COMMENT • link updated 21 months ago by Ram 44k • written 8.4 years ago by vm.higareda ▴ 30

score 1 · Answer 1 · 2016-08-04

1

Entering edit mode

8.4 years ago

Istvan Albert 102k

I would say that you should not need to worry about getting reads mapped as spliced if there is no actual splicing taking place in the data.

So there is no need to pass that option.

ADD COMMENT • link 8.4 years ago by Istvan Albert 102k

score 1 · Answer 2 · 2016-08-05

1

Entering edit mode

8.4 years ago

Brian Bushnell 20k

Bacterial RNA-seq data does have a little splicing, it's just rare and typically short, due to self-splicing transcripts. I think it probably does not matter a lot for the most part, but better safe than sorry... so, I'd suggest allowing spliced alignments, but restricting the maximum splice length to something fairly short, like 50bp; I'm not sure what the actual range is of self-splicing, but I seem to recall it's generally around 20bp.

Whether the library is strand-specific is important to analysis, not mapping. Unless you are using the Tuxedo pipeline for analysis (which I do not recommend), there's no particular reason to let strand-specificity impact your choice of aligner.

ADD COMMENT • link 8.4 years ago by Brian Bushnell 20k

0

Entering edit mode

Why do you do not recommend the tuxedo pipeline, I am new in this topic, What others programs can use?

Thank you

ADD REPLY • link 8.2 years ago by vm.higareda ▴ 30

1

Entering edit mode

Tuxedo pipeline is one of the older examples of RNAseq data analysis programs and is deprecated to some extent. Developers of tuxedo now recommend that you use HISAT2 which is the new program they wrote.

Since the original question was about bacterial RNAseq one does not need to worry about splicing over long distance so pretty much any NGS aligner could be used for alignment. BBMap, bwa, bowtie2 (and many others) are examples of this type of programs.

ADD REPLY • link 8.2 years ago by GenoMax 148k

0

Entering edit mode

There's an interesting caveat I just discovered. Apparently, bwa and bowtie2 do not add NH tag to the resulting SAM/BAM files (that's the flag indicating how many times did this particular read ID was reported mapped - so, simply put, it's important to keep track of the multimapping reads).

So, any type of downstream processing that "rescues" multimappers grinds to a halt here - you're pretty much stuck with only uniquely aligned reads. While that's mostly OK for many genes (bacterial genomes are not very repetitive most of the time), you will completely miss out on perfectly duplicated genes, as well as on many short RNAs. So far I've seen as much as 20% multimapping reads in bacterial RNA-seq, although 2-5% seems to be more common with longer reads.

ADD REPLY • link 6.4 years ago by predeus ★ 2.1k