HISAT2: Question regarding providing file path to indexed genome folder
1
0
Entering edit mode
4.1 years ago
venura ▴ 70

Hi,

I have a quick question regarding directing the path to the indexed genome folder. Following is the code I used;

hisat2 -p $threads --dta --rna-strandness RF -x /scratch/datasets/genome_indexes/other_genomes/potato/hisat2 -1 ${SAMPLE}.fq.gz -2 ${SAMPLE}.fq.gz -S ${SAMPLE}.sam

after loading the module

module load HISAT2/2.2.0-foss-2018b

I was running the script on our ADA cluster and got the following error

sh: /sw/eb/software/HISAT2/2.2.0-foss-2018b/bin/hisat2_read_statistics.py: No such file or directory (ERR): "/scratch/datasets/genome_indexes/other_genomes/potato/hisat2" does not exist Exiting now ...

Can someone help me to resolve this issue? Thanks in advance.

HISAT2 RNA-Seq • 2.4k views
ADD COMMENT
0
Entering edit mode

Output of ls /scratch/datasets/genome_indexes/other_genomes/potato/?

ADD REPLY
0
Entering edit mode

DM_1-3_516_R44_potato_genome_assembly.v6.1.1.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.2.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.3.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.4.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.5.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.6.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.7.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.8.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.fa

ADD REPLY
0
Entering edit mode

Based on the listing above it looks like there is no hisat2 directory. So you will need to try

-x /scratch/datasets/genome_indexes/other_genomes/potato/DM_1-3_516_R44_potato_genome_assembly.v6.1
ADD REPLY
0
Entering edit mode

My apologies. I was using

ls /scratch/datasets/genome_indexes/other_genomes/potato/hisat2/

Here is the correct output for ls /scratch/datasets/genome_indexes/other_genomes/potato/

blast bowtie bowtie2 bwa hisat2 picard samtools

ADD REPLY
0
Entering edit mode

Then simply insert hisat2 in right spot above.

ADD REPLY
0
Entering edit mode

Sorry, I think I confused you;

The output for ls /scratch/datasets/genome_indexes/other_genomes/potato/ is (answer to ATPoint's question)

blast bowtie bowtie2 bwa hisat2 picard samtools

When I ran the job I used the following code

hisat2 -p $threads --dta --rna-strandness RF -x /scratch/datasets/genome_indexes/other_genomes/potato/hisat2 -1 ${SAMPLE}.fq.gz -2 ${SAMPLE}.fq.gz -S ${SAMPLE}.sam

Directing to the hisat2 folder and got the error mentioned in the original post.

ADD REPLY
1
Entering edit mode

These are not genome indices, are they? The hisat index consists of several files, e.g. genome.ht2 etc...

This is how it should look e.g. for a genome called mm10.fa:

mm10.1.ht2  mm10.2.ht2  mm10.3.ht2  mm10.4.ht2  mm10.5.ht2  mm10.6.ht2  mm10.7.ht2  mm10.8.ht2

Here it would be -x mm10 as it is the suffix of the indexed file you have to provide. it then uses these ht2 files as needed.

ADD REPLY
0
Entering edit mode

Inside the hisat2 folder ( ls /scratch/datasets/genome_indexes/other_genomes/potato/hisat2/), there are eight files (I guess that is the default number it makes)

DM_1-3_516_R44_potato_genome_assembly.v6.1.1.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.2.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.3.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.4.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.5.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.6.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.7.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.8.ht2

Ah I see; that means I need to use DM_1-3_516_R44_potato_genome_assembly.v6.1.1 as follows /scratch/datasets/genome_indexes/other_genomes/potato/hisat2/DM_1-3_516_R44_potato_genome_assembly.v6.1

Thank you! I will do that

ADD REPLY
0
Entering edit mode

Even after changing the path, I am getting the following error (I killed the job after this error to save my service units)

sh: /sw/eb/software/HISAT2/2.2.0-foss-2018b/bin/hisat2_read_statistics.py: No such file or directory (ERR):

Prob due to a problem at cluster? (I emailed them too.... but no reply yet)

ADD REPLY
0
Entering edit mode

Are the fastq files in the right spot? Are those variables correctly pointing to those files?

ADD REPLY
0
Entering edit mode

they are in the same directory where the job is running from. I also check the file extensions too. Nothing makes sense :(

ADD REPLY
0
Entering edit mode
-1 ${SAMPLE}.fq.gz -2 ${SAMPLE}.fq.gz

This by the way is the same file. Try simplifying your script.

ADD REPLY
0
Entering edit mode

Oh, Shoot! You are correct. Still learning A, B, Cs..

ADD REPLY
2
Entering edit mode
4.1 years ago
ATpoint 85k

I personally always try to make it as simple as possible. Copy all the fastq files into one folder and give it clear names e.g.

Sample1_1.fastq.gz Sample1_2.fastq.gz Sample2_2.fastq.gz Sample2_2.fastq.gz

Then use the simplest possible script (or learn how to use workflow managers):

Idx=path/to/idxfiles

for i in *_1.fastq.gz
  do
  SAMPLE=${i%_1.fastq.gz}
  hisat2 (options...) -x "${Idx}" -1 ${SAMPLE}_1.fastq.gz -2 ${SAMPLE}_2.fastq.gz \
  | samtools view -o ${SAMPLE}.bam
  done

That's it. Eliminate unnecessary elements from your script as well as echo that indicate any kind of status. Trim it to the very necessary parts and then get it runnign. Then you can add additional things once it works.

ADD COMMENT
0
Entering edit mode

Will do the needful and Get back with the outcome! Thanks a lot, ATpoint! 🙏

ADD REPLY
0
Entering edit mode

Everything is running fine and got bam files too. :) The only exception is the following (I guess it is something to with installation at ADA cluster since I don't see such script there)

/sw/eb/software/HISAT2/2.2.0-foss-2018b/bin/hisat2_read_statistics.py: No such file or directory (ERR)

PS: Appreciate if you can point me to a good workflow management tool and tutorial for similar analysis like this.

ADD REPLY

Login before adding your answer.

Traffic: 1741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6