hello everyone, I'm new with regards to RNA-Seq analysis, I'm learning to use HISAT2. My question is relates with the step of build index. the command line are the following according to nprot2006:
$ extract_splice_sites.py hg19_data/genes/hg19.gtf >hg19.ss (ready)
$ extract_exons.py hg19_data/genes/hg19.gtf >hg19.exon (ready)
but when I was running the second part:
hisat2-build --ss hg19.ss --exon chrX.exon hg19_data/genome/hg19.fa hg19_tran (it is not ready)
it reached to generate seven files with the follows names:
hg19_tran.1.ht2; hg19_tran.2.ht2; hg19_tran.3.ht2; hg19_tran.4.ht2; hg19_tran.7.ht2; hg19_tran.8.ht2; hg19_tran.rf
the problem: last friday there was a blackout power supply and with this the computer was turned off before finishing.
My question: does anyone know how much files will be generated in the build index?, if a file could be generated, can It be run only the one I'm missing?
I will be attentive to your comments
thanks you all.
If there was a computer crash during the building of the index you should start over at
hisat2-build
step (delete those*.ht2
files) .. just to be safe.you say that all working was lost?, delete those *.ht2 would be like starting over...
At the index build step, unfortunately yes, since I don't think there is any way for the program to pickup from where things went down.
or you can download index for hg19 from hisat2 group website (for genome. only primary assembly): ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/hg19.tar.gz
Where can I get its corresponding FASTA reference sequence and the annotation file?
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/ (chromFa.tar.gz) and gtf/gff3 from UCSC table browser