Entering edit mode
6.8 years ago
Vasu
▴
790
As mentioned in the paper I first extracted splice-sites and then exons. Next I used hisat2-build
hisat2-build --ss gencode.v27.primary_assembly.annotation.ss --exon gencode.v27.primary_assembly.annotation.exon annot_AND_refFASTA/Homo_sapiens.GRCh38.dna.primary_assembly.fa Hisat2index
After few minutes this is what I saw:
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Local sequence length: 57344
Local sequence overlap between two consecutive indexes: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
annot_AND_refFASTA/Homo_sapiens.GRCh38.dna.primary_assembly.fa
Reading reference sizes
Time reading reference sizes: 00:00:34
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:01:10
Time to read SNPs and splice sites: 00:00:01
Killed
Does Killed mean there is some error?
But I see files like Hisat2index.0.rf, Hisat2index.1.ht2, Hisat2index.2.ht2, Hisat2index.3.ht2, Hisat2index.4.ht2, Hisat2index.5.ht2, Hisat2index.6.ht2, Hisat2index.7.ht2, Hisat2index.8.ht2
Did everything went well or do I need to fix something?
Potentially, your indexes may be incomplete.
You can download pre-made indexes directly from here.
But I would like to build my own using the gtf
How much memory do you have available? That may be the limiting factor in your case unless you are running this on a cluster and ran out of wall clock time.
Ok. I'm building index on cluster not on my desktop computer.
Ask for more RAM and 3-4 h just to be safe when you re-run.
Will give a try with this Thanks
As you said I gave the run with 30G memory and more run time. The following is what I see:
I don't see any Killed message. Does it mean everything is fine? It took only 20 minutes and I have the outputs.
There shouldn't be any rf files in there - these are temp file. If the index is compete you will have the follwing files e.g.,
and is that all the output you got? I had much more... and 20 min even with 30G seems a bit short....
These are the files I got.
Hisat2index.0.rf, Hisat2index.1.ht2, Hisat2index.2.ht2, Hisat2index.3.ht2, Hisat2index.4.ht2, Hisat2index.5.ht2, Hisat2index.6.ht2, Hisat2index.7.ht2, Hisat2index.8.ht2
Do you think this right? OR do I need to try with more memory and time. I gave 30 G and 6 h run time. But in 20 mins the job is completed.
Test by doing an alignment with a small number of reads. If things are not right that alignment job should fail.
If the alignment job fails then What should I do? Do I need build the index again.
BTW I saw the sizes GRCh38_Hisat2_index.4.ht2 (703M), GRCh38_Hisat2_index.3.ht2 (12K), GRCh38_Hisat2_index.2.ht2 (0), GRCh38_Hisat2_index.1.ht2 (8K), GRCh38_Hisat2_index.8.ht2 (1.1K), GRCh38_Hisat2_index.7.ht2 (5K) and GRCh38_Hisat2_index.0.rf (39G)
I don't think your index is complete - especially as you still have a temp file (Hisat2index.0.rf) and the file sizes are very small - in one case even 0!!! mine range from 1.8 G to 12 KB - and more importantly are very similar to the file sizes of the index files I downloaded from HISAT2. As there is no error message I cannot tell you what went wrong... But according to this protocol you need 160G for the whole human genome - so my guess is that this is the issue. So you need to set the RAM at least to 160G.