Hisat2 index builder seems to be running indefinitly
0
0
Entering edit mode
5.3 years ago
Whirlingdaf ▴ 60

Hello, I am attempting to create an new index from Emsemble reference files, and the index builder is taking far longer than what I am used to when creating a new index. The builder command has been running now for >48 hrs and I am a bit confused on why it is taking so long/if it is working.

I am running: hisat2-build -p 6 --ss /path/to/CanFam3.1.97_intron.bed --exon /path/to/CanFam3.1.97_exonsFile.table -f /path/to/Canis_familiaris.CanFam3.1.dna.toplevel.fa CanFam3.1.97

And the output I have gotten from this run so far is:

   Settings:
  Output files: "CanFam3.1.97.*.ht2"
  Line rate: 7 (line is 128 bytes)
  Lines per side: 1 (side is 128 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /scratch/clove/canids/Reference/Genome/Ensemble/Canis_familiaris.CanFam3.1.dna.toplevel.fa
Reading reference sizes
  Time reading reference sizes: 00:00:17
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:13


But it has been on this last 'Time to join reference sequences' for >12 hrs.
The .fa file appears to be formatted correctly: 

>1 dna:chromosome chromosome:CanFam3.1:1:1:122678785:1 REF
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTATGTGAGAAGATAGCTGAA
CGCCTTGTCCACATCATCTTACTGCTGAGAGTTGAGCTCACCCTCAGTCCCTCACAGTTC
CACACTGCCTGCAGAGTGAGTTTCCCATGTCTTCACCAGAGACTTTTGCCAGAGGCTTCT
GAGACGCAAGTTAACAATGCAGACCTGGAGGGTATCTCCAGGTGCAGTAGAGTGGTAATC
TCGGAACCTCCTGACTCAGAATACTGCTACCTTCACACTGTCATAAGAATGCAGCGAGTT
GAGAGCTGGCTTCTAGGCATGCTTCCTTTTGAGAGCTGAGGACAGGACAGAACCCTCCCG
CATCCTGCCTGACTGTAGACGTACCTGCTAACCTCCTCATGTTAGTGGCTGGGATAGATT
GTGGGAAAAGCATGTGTAAGCATTGGGCCTGAACTCCCGTGTATCTGAGTTGAATACAGC

As does the gtf file that the intron and exon files were created from:

X       ensembl gene    1575    5716    .       +       .       gene_id "ENSCAFG00000010935"; gene_version "3"; gene_source "ensembl"; gene_biotype "protein_coding";
X       ensembl transcript      1575    5716    .       +       .       gene_id "ENSCAFG00000010935"; gene_version "3"; transcript_id "ENSCAFT00000017396"; transcript_version "3"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_source "ensembl"; transcript_biotype "protein_coding";

Can anyone help me determine why this index is taking far more time to run than when I have created them in the past?

Thank you for your help!

RNA-Seq Hisat2 • 1.1k views
ADD COMMENT
0
Entering edit mode

Does it still run? You can check with the top command in a new terminl window.

ADD REPLY
0
Entering edit mode

Yes, it does appear to still be running.

ADD REPLY
0
Entering edit mode

Have you solved the problem yet? I have the same problem.

ADD REPLY

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6