Question

Problem in indexing toplevel genome with HISAT2

1

Entering edit mode

6.3 years ago

Batu ▴ 300

As I mentioned in my old post, I was unable to index a toplevel genome (both unmasked and soft-masked) with HISAT2. I still have problems with that. I'm using command as below: hisat2-build -f Mus_musculus.GRCm38.dna.toplevel.fa.gz Cm3895_ht2/GRCm38

Firstly, it gives these warnings in lots of lines:

Warning: Encountered empty reference sequence
Warning: Encountered reference sequence with only gaps

and after some time, it gives an error as below:

Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Reference file does not seem to be a FASTA file
  Time to join reference sequences: 00:00:00
Total time for call to driver() for forward index: 00:28:31
Error: Encountered internal HISAT2 exception (#1)
Command: hisat2-build --wrapper basic-0 -f Mus_musculus.GRCm38.dna.toplevel.fa.gz Cm3895_ht2/GRCm38 
Deleting "Cm3895_ht2/GRCm38.1.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.2.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.3.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.4.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.5.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.6.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.7.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.8.ht2" file written during aborted indexing attempt.

Previously, I had no problem when using separate chromosome files. Is there anything I'm missing when using toplevel genome? Thanks...

RNA-Seq genome hisat2 hisat2-build index • 6.0k views

ADD COMMENT • link updated 21 months ago by Apex92 ▴ 320 • written 6.3 years ago by Batu ▴ 300

2

Entering edit mode

Guess you have the answer inside error log: 'Reference file does not seem to be a FASTA file'. Try to unpack the reference file to fasta format and run index build once again.

ADD REPLY • link 6.3 years ago by ahaswer ▴ 150

0

Entering edit mode

Yes, it worked after unpacking. Gzipped files normally work with main hisat2 command, therefore I couldn't think about this reason. Thank you...

ADD REPLY • link 6.3 years ago by Batu ▴ 300

1

Entering edit mode

Worth reading: https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use

ADD REPLY • link 6.3 years ago by WouterDeCoster 48k

score 4 · Accepted Answer · 2019-02-28

4

Entering edit mode

6.3 years ago

Batu ▴ 300

It worked after unpacking the genome. I couldn't figure out that gzipped files won't work whereas they work with main hisat2 command. Problem solved!

ADD COMMENT • link 6.3 years ago by Batu ▴ 300

0

Entering edit mode

I encountered the same problem and solved it by unzipping genome file. Thank you for bringing this up as a question.

ADD REPLY • link 21 months ago by Apex92 ▴ 320