Question

hisat2 parameters tuning to match tophat2 parameters

0

Entering edit mode

8.1 years ago

pierre.ortalo • 0

Hi, quick presentation: other bio-informatics students and myself are working on a RNA-seq project during summer to get our hands dirty and some experience with it. We are working on a project consisting in reproducing the RNAseq pipeline of a research team on another dataset.

We would like to move from tophat2 (used by the team we work for) to hisat2. We are both interested in this because when multithreading on a computer farm, we get a "[failed]" outcome when "writting tophat reports" and also because hisat2 is much more efficient and we would like to learn using this new software. Moreover, reproducing the pipeline strategy on another software could further strengthen its "proof of concept".

However, we face difficulties setting up parameters.

It would be very kind of you, if you could give us some guidance on how to reproduce these tophat2 parameters on hisat2:

(all other parameters as default):

“--min-intron-length 10 --max-intron-length 20000 --read-mismatches 3 -- read-gap-length 2 --read-edit-dist 3 --max-multihits 2 --b2-sensitive --segment-mismatches 2 -- segment-length 15 --min-segment-intron 10 --max-segment-intron 20000 --no-coverage-search”.

Another run using:

“--read-gap-length 1000 --read-edit-dist 1003 --b2-ma 3 --b2-rdg 3,1”

I understand that it is a bit much to ask, but that is an obstacle ( in a very early step of the pipeline). Hisat2 parameters are very cryptic for us yet. So if you could even just explain some underlying concepts that could help us do it ourselves it would be very nice!

Thanks in advance

RNA-Seq hisat2 tophat2 parameters • 2.9k views

ADD COMMENT • link updated 8.1 years ago by Istvan Albert 102k • written 8.1 years ago by pierre.ortalo • 0

2

Entering edit mode

Take a look at Simulation-based comprehensive benchmarking of RNA-seq aligners and see if it helps (indirectly).

ADD REPLY • link updated 8.1 years ago by Istvan Albert 102k • written 8.1 years ago by GenoMax 152k

2

Entering edit mode

As a general rule of thumb for most bioinformatics tools: the default settings should be reasonable for standard situations. Only when your dataset is "different" you can start fiddling around with parameters.

ADD REPLY • link 8.1 years ago by WouterDeCoster 48k

score 5 · Answer 1 · 2017-06-20

It appears that most of the parameters that you are asking about are in the histat2 manual:

--min-intronlen <int>
--max-intronlen <int>
--rdg <int1>,<int2>

etc. So finding out what stayed the same would simplify your question. Then some parameters probably don't apply since internally the algorithm has changed.

In general, I would not try to force one aligner to work exactly the same way as another. In addition, I would be very cautious setting these many parameters. Users are often under the impression that tools work exactly as described and that they understand what parameters do and how they interact.

In my opinion, this is rarely the case - not even the developer of the software may fully understand the many ways these parameters interact (no to mention the unexpected effects due to the order by which the various conditions are applied).

If your reporting crashes it is not because the aligner did not work the "right" way - it is because your reporting relies on features it should not.

Take comfort in believing what the authors state, that HiSat2 is a more efficient and better aligner than TopHat2 so you probably don't even need all those settings.