Error for hisat2 index for human genome
1
0
Entering edit mode
8.0 years ago
jolin0701-dy ▴ 100

I'd like to build hisat2 index for human genome.

Here is my command:

$extract_exons.py human.gtf > human.exon

$extract_splice_sites.py human.gtf > human.ss

$hisat2-build -p 4 human.fa --ss human.ss --exon human.exon human_tran

After quite a long time, it finished.

Here are the outputs:

-rw-rw-r-- 1 liz liz 10212687914 Nov 26 00:43 human_tran.0.rf

-rw-rw-r-- 1 liz liz       15588 Nov 23 23:21 human_tran.1.ht2

-rw-rw-r-- 1 liz liz 10496890898 Nov 26 00:44 human_tran.1.rf

-rw-rw-r-- 1 liz liz           4 Nov 23 23:21 human_tran.2.ht2

-rw-rw-r-- 1 liz liz 11895199208 Nov 26 00:46 human_tran.2.rf

-rw-rw-r-- 1 liz liz       11294 Nov 23 22:32 human_tran.3.ht2

-rw-rw-r-- 1 liz liz  8640102706 Nov 26 00:32 human_tran.3.rf

-rw-rw-r-- 1 liz liz   736462267 Nov 23 22:32 human_tran.4.ht2

-rw-rw-r-- 1 liz liz    13164848 Nov 23 22:33 human_tran.7.ht2

-rw-rw-r-- 1 liz liz     2591430 Nov 23 22:33 human_tran.8.ht2

But I think the output is not complete. It is probably due to the low RAM in our cluster.

In the hisat2 manual, the notes said that if you use --snp, --ss, and/or --exon, hisat2-build will need about 200GB RAM for the human genome size as index building involves a graph construction. Otherwise, you will be able to build an index on your desktop with 8GB RAM.

My questions are :

1 Is it possible to use --ss --exon with quite low RAM?

2 What is the difference of index files with and without --ss --exon function?

Thanks~~~

rna-seq • 3.6k views
ADD COMMENT
0
Entering edit mode

If there was no error in the log file for the run above then a "cautiously optimistic" answer may be "yes" for #1. Though I find it odd that the manual clearly says that 200GB would be needed for that option. Sometimes programmers refine their code and manual may be the last thing that they worry about updating.

Is the answer for #2 not in the manual (have not used those options)?

ADD REPLY
0
Entering edit mode
8.0 years ago

1 Is it possible to use --ss --exon with quite low RAM?

I associate myself with the comment of @genomax2. However, I'd like to know if you checked the log file and found nothing, or you found something (like a Warning or things like that). Personally, I think that if they specified 200 extra GB or RAM was meaning that, in case of using those files, you would have needed that extra to avoid getting it slow as hell. In fact, you say "after quite a long time, it finished", which in my experience is not usually the case with hisat2-build. So I would be "cautiously optimistic" too!

2 What is the difference of index files with and without --ss --exon function?

--ss and --exon files help the index to be precise on where the gene features start and end. In other words, you provide a list of coordinates that represent junctions and exons, and the index will fit more your data thus speeding up things. Making an index without --ss and --exon will still be ok but not optimized.

ADD COMMENT
0
Entering edit mode

Thanks for your answers.

Should I use --large-index for hisat2-build?

In the manual, it is said that hisat2-build can generate either small or large indexes. The wrapper will decide which based on the length of the input genome. If the reference does not exceed 4 billion characters but a large index is preferred, the user can specify --large-index to force hisat2-build to build a large index instead.

Thanks....

ADD REPLY
0
Entering edit mode

I never had to deal with that option, I think I can't really say anything with knowledge support. What I can say is that you might let the program choose, as a first try.

ADD REPLY

Login before adding your answer.

Traffic: 2774 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6