I'd like to build hisat2 index for human genome.
Here is my command:
$extract_exons.py human.gtf > human.exon
$extract_splice_sites.py human.gtf > human.ss
$hisat2-build -p 4 human.fa --ss human.ss --exon human.exon human_tran
After quite a long time, it finished.
Here are the outputs:
-rw-rw-r-- 1 liz liz 10212687914 Nov 26 00:43 human_tran.0.rf
-rw-rw-r-- 1 liz liz 15588 Nov 23 23:21 human_tran.1.ht2
-rw-rw-r-- 1 liz liz 10496890898 Nov 26 00:44 human_tran.1.rf
-rw-rw-r-- 1 liz liz 4 Nov 23 23:21 human_tran.2.ht2
-rw-rw-r-- 1 liz liz 11895199208 Nov 26 00:46 human_tran.2.rf
-rw-rw-r-- 1 liz liz 11294 Nov 23 22:32 human_tran.3.ht2
-rw-rw-r-- 1 liz liz 8640102706 Nov 26 00:32 human_tran.3.rf
-rw-rw-r-- 1 liz liz 736462267 Nov 23 22:32 human_tran.4.ht2
-rw-rw-r-- 1 liz liz 13164848 Nov 23 22:33 human_tran.7.ht2
-rw-rw-r-- 1 liz liz 2591430 Nov 23 22:33 human_tran.8.ht2
But I think the output is not complete. It is probably due to the low RAM in our cluster.
In the hisat2 manual, the notes said that if you use --snp, --ss, and/or --exon, hisat2-build will need about 200GB RAM for the human genome size as index building involves a graph construction. Otherwise, you will be able to build an index on your desktop with 8GB RAM.
My questions are :
1 Is it possible to use --ss --exon with quite low RAM?
2 What is the difference of index files with and without --ss --exon function?
Thanks~~~
If there was no error in the log file for the run above then a "cautiously optimistic" answer may be "yes" for #1. Though I find it odd that the manual clearly says that 200GB would be needed for that option. Sometimes programmers refine their code and manual may be the last thing that they worry about updating.
Is the answer for #2 not in the manual (have not used those options)?