Question

HISAT2 running issue (output getting slower)

0

Entering edit mode

2.2 years ago

jian227 • 0

Hi, I tried to align the reads to the reference using the below command

hisat2 -p 16 -x /genome_sp_c --dta --very-sensitive -1 R_1.fq -2 R_2.fq -S test.sam --summary-file summary_txt

The size of my fastq file is 60G in total for R1 and R2, hisat2 output first 10G of sam file using about 10 minutes, then around 20 minutes to finish the next 10G, the speed was getting slower exponentially, and when the size of sam file was around 50GB, the generating speed was about serval MB per hour so I could never finish the job.

Could anyone help me with it? One potential solution I am considering is, could I maybe cut my fastq files into pieces, run Hisat2, when piece the sam file back together? Could that be a doable solution if there is no way to fix the slowing down issue?

Many thanks in advance!

Hisat2 alignemnt • 1.1k views

ADD COMMENT • link 2.1 years ago by jian227 • 0

2

Entering edit mode

Strange problem, do you have enough memory? Maybe you can try to reduce threads first, for example, 8.

ADD REPLY • link 2.2 years ago by MatthewP ★ 1.4k

0

Entering edit mode

Thank you for your reply! I reduced the threads number and split my fastq file into small pieces finished mapping

ADD REPLY • link 2.1 years ago by jian227 • 0

score 1 · Answer 1 · 2022-09-20

In general, for large files is a good habit to split fastq files and run alignments on each then you can merge the resulting f sorted BAM files with

samtools merge

prints:

Usage: samtools merge [options] -o <out.bam> [options] <in1.bam> ... <inN.bam>
or: samtools merge [options] <out.bam> <in1.bam> ... <inN.bam>

The reason for that is that often you get better parallel performance, protects against memory leaks, you can keep track of progress much better.

You can split files with the Unix split or seqkit split