Hi,
I have genome reference, annotation files and hisat2-index from Hisat2 website
Genome reference: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/GRCh38.p10.genome.fa.gz
GENCODE gene annotation: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.annotation.gtf.gz
hisat2 index : genome_snp_tran (GRch38) from [https://ccb.jhu.edu/software/hisat2/index.shtml]
I see that genome.fa and gencode gtf file has chromosome names with chr
started. But the hisat-2 index file doesn't have chr
in it.
Should I remove chr
from fasta and gtf files? Or should I build own hisat2-index?
as ATpoint mentioned both are indeed technically OK.
I would additionally suggest to evaluate how much 'dependencies' you have for each of the file types (fasta etc). I mean if you also have a genome browser associated to it, blastDBs ... if those are plenty it might be more feasible to rebuild the hisat index, instead of start removing
chr
for a whole list of files and related resourcesIf I build hisat2-index on my own will that index has
chr
in it?Do you think this is the right way to build the hisat2-index? Got this from this paper Hisat2 stringtie paper
Second, build a HISAT2 index:
If your fasta has "chr" then your index will have "chr".
"chr" is just a part of the name of chromosomes. If your fasta file looks like
then that is perfectly valid. Just make sure your annotation and fasta file use the same
myfabulouschromosome1
notation.Yes, this is how gtf and fasta files look.
Gencode GTF file:
Fasta file:
Also other lncipedia_gtf:
Hi Wouter,
I have an error while building the index.
What could be the problem here?
It should be
--ss
and not-ss
.Oh ya...so sorry. Thank you !!
Both is ok, but removing
chr
is probably much faster.