hisat2-build indexing produces more than 8 output files
0
2
Entering edit mode
7.0 years ago
ddowlin ▴ 70

Hi all,

I am trying to assemble a primate transcriptome using hisat2/stringtie. First I want to index the genome using hisat2. I used both --ss and --exon to provide information on splice sites and exons.

The manual states that 8 files should be produced after indexing 1.ht2 to 8.h2t. However, I only have six files with this suffix (5.ht2 and 6.ht2 are missing) and one file is completely empty. Additionally I have 20 .rf files (0.rf to 19.rf).

Does anyone know why this is? Was there an error with indexing or can I use these files as is?

hisat2 indexing • 6.6k views
ADD COMMENT
0
Entering edit mode

You can download pre-built indexes from the Hisat2 website [https://ccb.jhu.edu/software/hisat2/index.shtml] u may also download genome_snp, genome_tran and / or genome_snp_tran.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion. Unfortunately pre-made indices aren't available for the species I am interested in.

ADD REPLY
0
Entering edit mode

Can you try without the --ss and --exon options? These options need big RAM. If its a big genome, indexing may have stopped with error.

Best,

ADD REPLY
0
Entering edit mode

Thanks--I tried without the --ss and --exon options and it seems to have worked OK.

ADD REPLY
1
Entering edit mode

Yup. So my guess is that your indexing may have stopped due to RAM issues. You can still provide the 'splice sites' file at the time of alignment.

Best

ADD REPLY
0
Entering edit mode

If there was no error produced during index build then go ahead and use them. Programs often will store these indexes in formats they choose/like. If there was an error then post that here.

ADD REPLY
0
Entering edit mode

I know this post is old but just thought I'd reply to this specific comment since I struggled with this issue for a while - I think HISAT2 is supposed to produce exactly 8 files, and as Satyajeet Khare mentioned may be the most correct explanation. I also got more than 8 files every time my process terminated prematurely due to RAM issues. I think the .rf files are temporary files and should NOT be used for alignment.

ADD REPLY
0
Entering edit mode

Do you know if your build run log had something in it that indicated a problem when you had the truncation happen? If HISAT2 devs have not done their due diligence to flag that in the log output then that is not a good thing.

Were you able to align data to (with what may be truncated index files)? One would expect HISAT2 to throw an error, if it detects an incomplete/corrupt index set.

Edit: hisat2-build manual says the following, which is incorrect. There should be 8 files in properly built indexes.

hisat2-build outputs a set of 6 files with suffixes .1.ht2, .2.ht2, .3.ht2, .4.ht2, .5.ht2, .6.ht2, .7.ht2, and .8.ht2.

ADD REPLY
0
Entering edit mode

Hi all, I indexed the genome using hisat2 and used both --ss and --exon to provide information on splice sites and exons. And I also met the same problem. I also only got six files with this suffix (5.ht2 and 6.ht2 are missing). Have you solve the problem? And do you know the reason?

ADD REPLY
0
Entering edit mode

Have you tried to use the index for an alignment? Don't go on the number of files produced. Properly built indexes should have 8 files.

ADD REPLY
0
Entering edit mode

Thanks for your help. I have not mapped the reads. OK. I will try.

ADD REPLY
0
Entering edit mode

I am having a similar issue - many .rf files but less than 8 .ht2 files. Was this ever resolved? Are they okay to use?

ADD REPLY
0
Entering edit mode

I doubt it is ok to use them. I have the same problem. Only 4 files are generated. I tried to use them for alignment but it won't work.

ADD REPLY

Login before adding your answer.

Traffic: 2173 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6