Question

HISAT2 or Tophat2

0

Entering edit mode

7.8 years ago

Payal ▴ 160

Hi I am new to rna seq..I would like to know which one of Tophat2 and Hisat2 is the better aligner for rna-seq data? Or is there an even better option?

rna-seq alignment • 15k views

ADD COMMENT • link updated 7.8 years ago by Charles Warden 8.3k • written 7.8 years ago by Payal ▴ 160

1

Entering edit mode

Between those two, HISAT2 is the latest so you would want to use that.

ADD REPLY • link 7.8 years ago by GenoMax 146k

2

Entering edit mode

7.8 years ago

WouterDeCoster 47k

I would suggest STAR, or maybe, if it fits your needs, alignment-free methods such as kallisto and salmon.

ADD COMMENT • link 7.8 years ago by WouterDeCoster 47k

0

Entering edit mode

7.8 years ago

Payal ▴ 160

Thank you for all your suggestions.

ADD COMMENT • link 7.8 years ago by Payal ▴ 160

score 11 · Accepted Answer · 2017-01-09

11

Entering edit mode

7.8 years ago

poisonAlien ★ 3.2k

Simulation-based comprehensive benchmarking of RNA-seq aligners

One should definitely read this and decide themselves.

Spoiler: Dont use tophat!

The most widely cited tool underperforms for most metrics, particularly when using default settings.

ADD COMMENT • link 7.8 years ago by poisonAlien ★ 3.2k

2

Entering edit mode

I have to give you an upvote because Tophat is so terrible, but so used.

ADD REPLY • link 7.8 years ago by Brian Bushnell 20k

1

Entering edit mode

Another thing that should be considered is how robust the results are to different parameters. In the Baruzzo study, almost all the aligners could be configured to give good results, but they differed in the performance of the default options, with STAR looking pretty good in those terms. I have to say though, we use HISAT a lot just because of how easy it is and how few resources it requires.

ADD REPLY • link 7.8 years ago by i.sudbery 20k

0

Entering edit mode

That paper has "some" issues, just don't read the conclusion, be very critical on their method. For example, they used the beta version of HISAT2 and not the latest.

ADD REPLY • link 7.8 years ago by WouterDeCoster 47k

0

Entering edit mode

Agreed. Regardless, above conclusion wouldn't have changed IMO.

ADD REPLY • link 7.8 years ago by poisonAlien ★ 3.2k

score 3 · Accepted Answer · 2017-01-09

3

Entering edit mode

7.8 years ago

Brian Bushnell 20k

I think BBMap and STAR are better options, and TopHat is generally not very good. I wrote BBMap; it's very accurate and indexes very quickly. STAR uses slightly more memory but is quite a bit faster.

However, this kind of question will spawn a lot of differing opinions.

ADD COMMENT • link 7.8 years ago by Brian Bushnell 20k

score 3 · Accepted Answer · 2017-01-10

1.-Do not use Tophat2. Nowadays you have many much better options, just look a the bechmarks.

2.- STAR it is pretty good, I would totally recommend it. However the issue with STAR is the high memory requirement. If you are working with human and you have less than 28 GB of RAM memory, you should use HISAT2 instead. Otherwise, both aligners programs should perform very similar.

score 3 · Accepted Answer · 2017-01-11

I would say it depends on what you want to do with your data.

I've sometimes found that the TopHat alignments work better than STAR alignments with some splicing analysis programs, possibly due to the format of the alignment.

I would consider the run-time for TopHat to be sufficiently quick that you could run comparisons and see what works best with your data (while the benchmark papers can be a useful starting point, the optimal strategy is not necessarily the same for every dataset). So, if the combination of latest aligner and downstream algorithm gives results that don't make sense, it may be a good idea to try other aligners / algorithms.

There are also options for gene expression quantification without the alignment step (Salmon, kallisto, Sailfish, etc.). If you have a two-group comparison with triplicates and clear expression differences, then that should work fine. However, I've found the accuracy for gene assignments for a given sample may be less accurate, and having replicates can give you some sense in the robustness of the read / expression assignments for transcripts (or the sum of transcripts for genes).