HISAT2 running issue (output getting slower)
1
0
Entering edit mode
2.2 years ago
jian227 • 0

Hi, I tried to align the reads to the reference using the below command

hisat2 -p 16 -x /genome_sp_c --dta --very-sensitive -1 R_1.fq -2 R_2.fq -S test.sam --summary-file summary_txt

The size of my fastq file is 60G in total for R1 and R2, hisat2 output first 10G of sam file using about 10 minutes, then around 20 minutes to finish the next 10G, the speed was getting slower exponentially, and when the size of sam file was around 50GB, the generating speed was about serval MB per hour so I could never finish the job.

Could anyone help me with it? One potential solution I am considering is, could I maybe cut my fastq files into pieces, run Hisat2, when piece the sam file back together? Could that be a doable solution if there is no way to fix the slowing down issue?

Many thanks in advance!

Hisat2 alignemnt • 1.1k views
ADD COMMENT
2
Entering edit mode

Strange problem, do you have enough memory? Maybe you can try to reduce threads first, for example, 8.

ADD REPLY
0
Entering edit mode

Thank you for your reply! I reduced the threads number and split my fastq file into small pieces finished mapping

ADD REPLY
1
Entering edit mode
2.2 years ago

In general, for large files is a good habit to split fastq files and run alignments on each then you can merge the resulting f sorted BAM files with

samtools merge

prints:

Usage: samtools merge [options] -o <out.bam> [options] <in1.bam> ... <inN.bam>
or: samtools merge [options] <out.bam> <in1.bam> ... <inN.bam>

The reason for that is that often you get better parallel performance, protects against memory leaks, you can keep track of progress much better.

You can split files with the Unix split or seqkit split

ADD COMMENT

Login before adding your answer.

Traffic: 1147 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6