Hey,
I want to align RNA-seq fastq files against a reference genome with splice sites using hisat2. First, I downloaded a reference genome and built an index using the hisat2-build function.
wget ftp://ftp.ensembl.org/pub/release-102/fasta/danio_rerio/dna/Danio_rerio.GRCz11.dna.primary_assembly.fa.gz
gzip -d Danio_rerio.GRCz11.dna.primary_assembly.fa.gz
mv Danio_rerio.GRCz11.dna.primary_assembly.fa genome.fa
hisat2-build -p 16 genome.fa genome
Then I downloaded the corresponding gtf file and created the hisat2 specific splice sites file:
wget ftp://ftp.ensembl.org/pub/release-102/gtf/danio_rerio/Danio_rerio.GRCz11.102.gtf.gz
gzip -d Danio_rerio.GRCz11.102.gtf.gz
mv Danio_rerio.GRCz11.102.gtf genome.gtf
hisat2_extract_splice_sites.py genome.gtf > genome.txt
Finally, I tried to align the paired end fastq files against the index.
hisat2 --dta --known-splicesite-infile genome.txt -x genome -1 fastq/mut_1_1.fastq.gz -2 fastq/mut_1_2.fastq.gz > hisat2/mut_1.sam
The output file is generated. But when it reaches a size of 4.0 GB, there is always this error message:
Error while flushing and closing output
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): hisat2-align exited with value 134
I searched what this error could mean, and it might be due to too less memory space. I checked the storage, and the Hard Drive has 75 GB of free space. I have all my files on an external media and there are nearly 400 GB space left. This should be enough.
Can you help me in understanding and solving the error?
Is it intentional that the splice file ends with
.ss
but you feed a.txt
to hisat?Thanks for this hint. I corrected the code above. I was wondering if the splice site file is occuring the error and tried to align the fastq files against the reference genome without the file:
But this leads to the same error like above.