HISAT2 Indexing using annotation for Rattus_norvegicus
1
1
Entering edit mode
6.1 years ago
neranjan ▴ 70

Hi,

I am trying to create a HISAT2 index with annotation for Rattus_norvegicus (RAT) genome I downloaded from the Ensembl release 94.

I am currently using 220GB memory with 16 cores. My assumption is the memory which I am providing is adequate enough. But I can not create the HISAT2 index, and it gives the error of

Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:16
  Time to read SNPs and splice sites: 00:00:04
        is not reverse-deterministic, so reverse-determinize...
  Ran out of memory; automatically trying more memory-economical parameters.
        is not reverse-deterministic, so reverse-determinize...

and eventually fail with

Could not find approrpiate bmax/dcv settings for building this index.
Switching to a packed string representation.
Total time for call to driver() for forward index: 08:45:56

HISAT2 website does have rat index but they do not have the annotation.

Iam using the command

hisat2-build -p 16 --exon ${EXON} --ss ${SPLICE} ${FASTA_File} ${BASE_NAME}

to create the index.

Any ideas is greatly appreciated.

Thanks

HISAT2 annotation Ensembl index alignment • 4.9k views
ADD COMMENT
0
Entering edit mode

Do you run on a cluster, and if so, what is the exact command, including the header lines for the scheduler? Did you request the entire memory of the node you are on?

ADD REPLY
0
Entering edit mode

No the node has 256GB memory and I only asked for 220GB of RAM. I never asks for the full amount since the node needs some memory to work with. In previous occasions I have only asked for 200GB.

In pervious cases for the same index I have asked for 300GB of RAM where the node had 512GB of memory , which didn't work as well.

ADD REPLY
0
Entering edit mode

If you share the links to the necessary files, I can try to build it on a 3TB node if that helps you.

ADD REPLY
0
Entering edit mode

Yes that might help me , Thank you very much for the help.

I am using the files hosted by Ensembl Data Base, and using the hisat2 version 2.1.0 to build the index. Following is the SLURM script which I use to build it. I will post it, where the memory, partition and qos might change depending on the cluster and the the scheduler which is been used.

#!/bin/bash
#SBATCH --job-name=hisat
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 16
#SBATCH --mem=220G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

#genome 
wget ftp://ftp.ensembl.org/pub/current_fasta/rattus_norvegicus/dna/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz
#GTF 
wget ftp://ftp.ensembl.org/pub/current_gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.94.gtf.gz


for file in *.gz; do
       gunzip -d $file
done
echo "=========== Unzip Done ================="

BASE_NAME="Rattus_norvegicus"
FASTA_File="Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa"
GTF="Rattus_norvegicus.Rnor_6.0.94.gtf"
SPLICE="splice_site"
EXON="exon"

module load hisat2/2.1.0
#create splice sites
hisat2_extract_splice_sites.py ${GTF} > ${SPLICE}

#create exone file
hisat2_extract_exons.py ${GTF} > ${EXON}

#build index
hisat2-build -p 16 --exon ${EXON} --ss ${SPLICE} ${FASTA_File} ${BASE_NAME}

#build large index if the above does not work
#hisat2-build -p 16 --large-index --exon ${EXON} --ss ${SPLICE} ${FASTA_File} ${BASE_NAME}

If the normal hisat2 build does not work, you can also try to build the large index using the commented part as well.

Thanks again for the help.

ADD REPLY
1
Entering edit mode

I just started it and will come back once finished.

ADD REPLY
0
Entering edit mode

thanks, appreciate it. if it complete successfully would like to know, how much memory did it used ?

ADD REPLY
1
Entering edit mode

It finished without issues on a 1.5TB node. Used about 500GB at max. I am compressing and uploading it now to a cloud, and will share the download link once finished:

There it is: https://uni-muenster.sciebo.de/s/ztztgCWvQujnhjq

ADD REPLY
0
Entering edit mode

Thank you very much, I really appreciate the help you gave me, going out of the way. I was able to download the index from the link.

1.6G Rattus_norvegicus.1.ht2
654M Rattus_norvegicus.2.ht2
1.3M Rattus_norvegicus.3.ht2
651M Rattus_norvegicus.4.ht2
1.4G Rattus_norvegicus.5.ht2
663M Rattus_norvegicus.6.ht2
7.8M Rattus_norvegicus.7.ht2
1.6M Rattus_norvegicus.8.ht2

Again thank you very much.

ADD REPLY
1
Entering edit mode

You‘re very welcome :)

ADD REPLY
0
Entering edit mode

found a solution to generate the index using more memory

Cheers!

ADD REPLY
0
Entering edit mode

Hi,

I am having the same issue with HISAT2 Indexing using annotation for Rattus norvegicus. I currently don't have access to a cluster with sufficient memory and I am stuck with my transcriptome analyses. I have seen that @ATpoint made these indexes available but the link is dead.

Would @ATpoint or any of you be able to share these indexes again?

Thank you in advance,

ADD REPLY
0
Entering edit mode

I do not have them anymore. Why don't you use a tool such as salmon to quantify directly against the transcriptome. It barely requires any memory.

ADD REPLY
1
Entering edit mode
6.1 years ago
neranjan ▴ 70

I think the answer is to provide more memory for the run. Thank you ATpoint.

ADD COMMENT
1
Entering edit mode

There is no need to close this question. Just accepting this as an answer is sufficient.

ADD REPLY

Login before adding your answer.

Traffic: 1684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6