Query about memory size usage of Hisat2
0
0
Entering edit mode
4.6 years ago
boymin2020 ▴ 80

Hi Guys, recently I have been dealing with a batch of silkworm(Bombyx mori) RNAseq data. An error arose which I cannot debug. Below is my workflow.

1.the genomic sequence of the silkworm (silkDB 3.0) is about 468.3Mb, 28 chromosomes.

2.The Linux server I am using has 288 cores and 1Tb memory size.

3.No problem arose when I created INDEX files with hisat2-build functionality.

4.An error always exits when hisat2 alignment. The following is an example. The memory size usage (%mem) continued to increase after the job submitted.

hisat2 -t -p 30 --dta -x /home/RNAseq_2/source/silkworm/index/silkworm_tran -1 /data/storage04/RNAseq_2/silkworm/majorbio/data4antivirus/cleandata/306D3D1a_R1-clean.fastq.gz -2 /data/storage04/RNAseq_2/silkworm/majorbio/data4antivirus/cleandata/306D3D1a_R2-clean.fastq.gz -S /data/storage04/RNAseq_2/silkworm/majorbio/data4antivirus/alignedFromHisat2Results/306D3D1a.sam

5.the size of the targeted sam file is expected to be 22Gb. But now, %mem is 55 when the sam file is just 7.8Gb.

6.I had tried to run similar 5 jobs with 8cores/job,resulting in the following error message:

(ERR): hisat2-align died with signal 9 (KILL)

I have googled a lot without any progress. Could you please figure out the issue and speed up the job?

Thanks in advance,

RNA-Seq Hisat2 silkworm • 2.5k views
ADD COMMENT
0
Entering edit mode

The node should be more than capable of dealing with this task based on the specs. Did you use a scheduler such as SLURM? If so please post the header lines of the submission script. Probably you did not allocate enough memory and the scheduler might have killed it.

ADD REPLY
0
Entering edit mode

Thanks for so fast comment. No scheduler was installed on the server. Therefore, I submit the job with nohup. Below is an example. nohup bash ${id}_hisat2.sh > ${outDir}/shell/logerr/${id}_hisat2.nohup-logerr 2>&1 &

ADD REPLY
0
Entering edit mode

Please run it on a single file with a plain bash command outside of that script and without any nohup, not sending it to background and without redirecting any streams. This will show where to start debugging. You can also run it just on a subset of the entire file for testing purposes.

ADD REPLY
0
Entering edit mode

I re-run it on a single file with a plain bash command without nohup at my laptop (8 cores, 16G memory size), resulting in the same error. Then I tried to check the original fastq files. The fastp tool for QC shows a big difference at the ADAPTER box between successful (~4Gb) and failed samples (~6Gb) in the alignment step.

INFO of a successful sample

Adapter or bad ligation of read1. The input has little adapter percentage (~0.247438%), probably it's trimmed before.

Sequence Occurrences 1. A 3051 2. G 2336 3. T 4015 4. other adapter sequences 206857

Adapter or bad ligation of read2. The input has little adapter percentage (~0.246621%), probably it's trimmed before.

Sequence Occurrences 1. A 3086 2. G 2277 3. T 4063 4. other adapter sequences 206838

INFO of a failed sample

Sequence Occurrences

  1. A 14240
  2. AG 13894
  3. AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGTGGAAATCTCGTATGCCGTCTTCTGCTT 42850
  4. AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGTGGAAATCTCGTATGCCGTCTTCTGCTTGAAAA 47767
  5. AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGTGGAAATCTCGTATGCCGTCTTCTGCTTGAAAAA 15307
  6. other adapter sequences 1232207

Adapter or bad ligation of read2

Sequence Occurrences

  1. A 14233
  2. AG 13953
  3. AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGCGTCTATGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAA 15624
  4. other adapter sequences 1322715

What I can tell is that these two samples were derived from different batches, one of which was trimmed before my handling. But I still do not know how to debug it. Appreciate any advice.

ADD REPLY
0
Entering edit mode

Did you verify that the index you created was good? Is this install of HISAT2 known to otherwise work well? You have more than adequate hardware capacity (assuming nothing else is consuming that capacity when you are running these jobs) for this to work.

ADD REPLY
0
Entering edit mode

Yes, I have successfully run three samples from the same batch. PS: they have similar file sizes and pre-processed by fastp.

ADD REPLY
0
Entering edit mode

Do you get anything else printed after:

hisat2-align died with signal 9 (KILL)

Sigkill 9 indicates that something is not right and the program needs to abort. If you have other samples that have worked well with HISAT2 on this machine then I would suggest that you investigate if your fastq files for this particular sample are corrupt. It may be best to re-process the originals and see if you have better luck with newly made files. Hope you are trimming paired-end data files together.

ADD REPLY
0
Entering edit mode

Good advice. I am checking the original fastq files. Maybe remote transportation from my laptop at USA to the Linux server at China is the reason.

ADD REPLY

Login before adding your answer.

Traffic: 1639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6