Refseq1405.40 and hisat2-build (-ss and --exon)
0
0
Entering edit mode
8 weeks ago
chrisk • 0

Hello Biostars community,

We would like to ask whether anyone is able to create an index using hisat2-build with --ss (splice sites) and --exon on the below refseq files. If you have had issues related to this in the past, would also be useful to hear any advice/lessons learned.

Current pain point:

  • hisat2-build stalls at generation 4 for 20 hours (log below) despite indication from top, that program is running.
  • GCF_000001405.40_GRCh38.p14_genomic.gtf.gz build completes successfully without --ss and --exon.
  • In the past, we have successfully buit an index, using these settings on the Gencode v42/41/39 comprehensive fasta and gtf files (with --exon and --ss).

Setup: hisat2 version hisat2 2.2.1 (latest)

version refseq -GCF_000001405.40_GRCh38.p14_genomic.fna.gz 2024-08-27 09:57 928M -GCF_000001405.40_GRCh38.p14_genomic.gtf.gz 2024-08-27 09:57 54M

machine stats Local VM: 20 Core CPU , 256 RAM , 1.6 TB hd space,

Extract exon: hisat2_extract_exons.py GCF_000001405.40_GRCh38.p14_genomic.sorted.gtf > Original_refseq1405.40_extractexon

Extract splice: hisat2_extract_splice_sites.py GCF_000001405.40_GRCh38.p14_genomic.sorted.gtf > Original_refseq1405.40_extractsplice

hisat2-build: hisat2-build -p 4 --exon Original_refseq1405.40_extractexon --ss Original_refseq1405.40_extractsplice GCF_000001405.40_GRCh38.p14_genomic.fna HISAT_RefSeq1405_40_Full_Index_SS_Exon

(also tested without any thread assignment (-p))

Thanks in advance, Chris

Log out: 

Settings:
  Output files: "HISAT_RefSeq1405_40_Full_Index/HISAT_RefSeq1405_40_Full_Index_SS_Exon.*.ht2"
  Line rate: 7 (line is 128 bytes)
  Lines per side: 1 (side is 128 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  GCF_000001405.40_GRCh38.p14_genomic.fna
Reading reference sizes
  Time reading reference sizes: 00:00:42
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:13
  Time to read SNPs and splice sites: 00:00:02
Generation 0 (3137151402 -> 3137151402 nodes, 0 ranks)
COUNTED NEW NODES: 8
COUNTED TEMP NODES: 0
RESIZED NODES: 20
RESIZED NODES: 0
MADE NEW NODES: 22
Generation 1 (3137468813 -> 3137468813 nodes, 0 ranks)
COUNTED NEW NODES: 6
COUNTED TEMP NODES: 0
RESIZED NODES: 19
RESIZED NODES: 0
MADE NEW NODES: 23
Generation 2 (3138104089 -> 3138104089 nodes, 0 ranks)
COUNTED NEW NODES: 6
COUNTED TEMP NODES: 0
RESIZED NODES: 20
RESIZED NODES: 0
MADE NEW NODES: 23
Generation 3 (3139375392 -> 3139375392 nodes, 0 ranks)
BUILT FROM_INDEX: 17
COUNTED NEW NODES: 6
COUNTED TEMP NODES: 0
RESIZED NODES: 20
RESIZED NODES: 0
MADE NEW NODES: 24
RESIZE NODES: 68
COUNT NUMBER IN EACH BIN: 14
FINISHED FIRST ROUND: 26
26 789568741
103 841602280
67 786479349
170 724271090
FINISHED RECURSIVE SORTS: 87
SORT NODES: 127
MERGE, UPDATE RANK: 69
Generation 4 (3141921460 -> 3141356782 nodes, 1139244126 ranks)
ALLOCATE FROM_TABLE: 32
COUNT NUMBER IN EACH BIN: 13
FINISHED FIRST ROUND: 36
94 789748680
94 789476690
93 781444256
93 780687156
FINISHED RECURSIVE SORTS: 71
BUILD TABLE: 120
BUILD INDEX: 18
82 nodes, 1139244126 ranks)
ALLOCATE FROM_TABLE: 32
COUNT NUMBER IN EACH BIN: 13
FINISHED FIRST ROUND: 36
hisat2-build • 266 views
ADD COMMENT
0
Entering edit mode

Hello Biostars community,

For anyone else whom this issue may concern, It was obviated with a bare metal install of Linux. We did not hear back from HISAT2 developers.

Happy to share further details if anyone wants to reach out, Chris.

ADD REPLY

Login before adding your answer.

Traffic: 1929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6