Hi I am new to rna seq..I would like to know which one of Tophat2 and Hisat2 is the better aligner for rna-seq data? Or is there an even better option?
Hi I am new to rna seq..I would like to know which one of Tophat2 and Hisat2 is the better aligner for rna-seq data? Or is there an even better option?
Simulation-based comprehensive benchmarking of RNA-seq aligners
One should definitely read this and decide themselves.
Spoiler: Dont use tophat!
The most widely cited tool underperforms for most metrics, particularly when using default settings.
Another thing that should be considered is how robust the results are to different parameters. In the Baruzzo study, almost all the aligners could be configured to give good results, but they differed in the performance of the default options, with STAR looking pretty good in those terms. I have to say though, we use HISAT a lot just because of how easy it is and how few resources it requires.
I think BBMap and STAR are better options, and TopHat is generally not very good. I wrote BBMap; it's very accurate and indexes very quickly. STAR uses slightly more memory but is quite a bit faster.
However, this kind of question will spawn a lot of differing opinions.
1.-Do not use Tophat2. Nowadays you have many much better options, just look a the bechmarks.
2.- STAR it is pretty good, I would totally recommend it. However the issue with STAR is the high memory requirement. If you are working with human and you have less than 28 GB of RAM memory, you should use HISAT2 instead. Otherwise, both aligners programs should perform very similar.
I would say it depends on what you want to do with your data.
I've sometimes found that the TopHat alignments work better than STAR alignments with some splicing analysis programs, possibly due to the format of the alignment.
I would consider the run-time for TopHat to be sufficiently quick that you could run comparisons and see what works best with your data (while the benchmark papers can be a useful starting point, the optimal strategy is not necessarily the same for every dataset). So, if the combination of latest aligner and downstream algorithm gives results that don't make sense, it may be a good idea to try other aligners / algorithms.
There are also options for gene expression quantification without the alignment step (Salmon, kallisto, Sailfish, etc.). If you have a two-group comparison with triplicates and clear expression differences, then that should work fine. However, I've found the accuracy for gene assignments for a given sample may be less accurate, and having replicates can give you some sense in the robustness of the read / expression assignments for transcripts (or the sum of transcripts for genes).
I would suggest STAR, or maybe, if it fits your needs, alignment-free methods such as kallisto and salmon.
Thank you for all your suggestions.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Between those two, HISAT2 is the latest so you would want to use that.