Question

Histat2 build command is taking too long

0

Entering edit mode

3.4 years ago

Peter ▴ 20

Hello

I used the following commands:

hisat2_extract_splice_sites.py Homo_sapiens.GRCh38.80.gtf > splice_sites.txt


hisat2_extract_exons.py Homo_sapiens.GRCh38.80.gtf > exons.txt

hisat2-build referenceData/fasta/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa \
--ss referenceData/hisat2_index/splice_sites.txt \
--exon referenceData/hisat2_index/exons.txt \
referenceData/hisat2_index/GRCh38.hisat2

Settings:
 Output files: "referenceData/hisat2_index/GRCh38.hisat2.*.ht2"
  Line rate: 7 (line is 128 bytes)
  Lines per side: 1 (side is 128 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  referenceData/fasta/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa
Reading reference sizes
  Time reading reference sizes: 00:00:22
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:16
  Time to read SNPs and splice sites: 00:00:02"

It's been running for over 1 hour. In my directory the outputs were created:

GRCh38.hisat2.0.rf (27GB)
GRCh38.hisat2.1.ht2 (8.2kb)
GRCh38.hisat2.2.ht2 (0 bytes)
GRCh38.hisat2.3.ht2 (11.3Kb)
GRCh38.hisat2.4.ht2 (736 MB)
GRCh38.hisat2.7.ht2 (13.1 MB)
GRCh38.hisat2.8.ht2 (2.6 MB)

It hasn't given any errors yet, but I'm worried. It's my first time analyzing RNA-seq data, does anyone know what's going on?

Thanks!

linux histat • 2.4k views

ADD COMMENT • link updated 3.4 years ago by rependo ▴ 40 • written 3.4 years ago by Peter ▴ 20

1

Entering edit mode

As long as the program is working and producing output nothing to worry about. Be patient and wait.

ADD REPLY • link 3.4 years ago by GenoMax 151k

1

Entering edit mode

One hour for a full human index with SNPs on a single core is not much. It will take some time. Coffee and wait.

ADD REPLY • link 3.4 years ago by ATpoint 88k

1

Entering edit mode

I agree that 1 hour for hisat build human index is not that long, and since you are building ir with --ss and --exon it will need about ~200GB of RAM according to the manual. If you need an already build index, the hisat2 website (http://daehwankimlab.github.io/hisat2/download/) has a few ones ready for download.

ADD REPLY • link 3.4 years ago by vitor ▴ 130

0

Entering edit mode

Thanks

Worked well! I took out --ss and --exon

My output was 8 .ht2 files with different sizes

ADD REPLY • link 3.4 years ago by Peter ▴ 20

0

Entering edit mode

In my experience, you have a problem if the .ht2 index file is empty. This seems to crop up whenever you haven't used enough memory during the index build step, and you won't be able to align reads using that index until it's corrected.

For reference, I've only been able to make it through building a hisat2 index with 200gb of RAM, which lines up with recommendations in the manual.

ADD REPLY • link 3.4 years ago by rependo ▴ 40