Error: Could not find Bowtie 2 index files (GRCh38.*.bt2l)
2
0
Entering edit mode
8.1 years ago
ashMC ▴ 10

I am running tophat/2.1.1 and using bowtie/2.2.9. I run the following commands, and get the following errors.

tophat -G reference_files/Homo_sapiens.GRCh38.86.gtf -p4 -o tophat_results/ reference_files/GRCh38 data/S_01_L003_R1.fastq

And I get the following error:

[2016-11-03 16:39:02] Beginning TopHat run (v2.1.1) 

[2016-11-03 16:39:02] Checking for Bowtie

 Bowtie version:     2.2.9.0

[2016-11-03 16:39:02] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (reference_files/GRCh38.*.bt2l)

I only have .bt2 files.

tophat tophat2 bowtie bowtie2 RNA-seq • 17k views
ADD COMMENT
0
Entering edit mode

You did not specify the basename of your bowtie2 index files on your command line. Try the following command:

tophat -p4 -G reference_files/Homo_sapiens.GRCh38.86.gtf -o tophat_results/ reference_files/Homo_sapiens.GRCh38.86 reference_files/GRCh38 data/S_01_L003_R1.fastq
ADD REPLY
0
Entering edit mode

That does not work. I get the same error Error: Could not find Bowtie 2 index files (reference_files/GRCh38.*.bt2l)

ADD REPLY
0
Entering edit mode

What's the name of the index files you created?

ADD REPLY
0
Entering edit mode

GRCh38.no_alt_analysis_set.fna.bowtie_index.1.bt2l (then the same for 2,3,4 and rev.1,rev.2)

ADD REPLY
2
Entering edit mode

Then you need to use GRCh38.no_alt_analysis_set.fna.bowtie_index as the index basename.

tophat -p4 -G reference_files/Homo_sapiens.GRCh38.86.gtf -o tophat_results/ reference_files/GRCh38.no_alt_analysis_set.fna.bowtie_index reference_files/GRCh38 data/S_01_L003_R1.fastq

A suggestion. Give your files better names to make this easy on yourself and don't keep indexes and data in the same directory. Please replace spaces in directory/file names with "_".

ADD REPLY
0
Entering edit mode

Hello,

When I do this, I get the following error. What is the meaning of 'reference_files/GRCh38' after the 'reference_files/GRCh38.no_alt_analysis_set.fna.bowtie_index'?

It would see that the second 'reference_files/GRCh38' is a typo?

Also - if I need a transcriptome-index file ('--transcriptome-index=transcriptome_data/known/'), would I add that right before 'data/S_01_L003_R1.fastq '?

[aconard@login001 scratch]$ tophat -p4 -G reference_files/GRCh38.gtf -o tophat_results/ reference_files/GRCh38.no_alt_analysis_set.fna.bowtie_index reference_files/GRCh38  data/S_01_L003_R1.fastq 

[2016-11-04 09:02:47] Beginning TopHat run (v2.1.1)
######-----------------------------------------------
[2016-11-04 09:02:47] Checking for Bowtie

 Bowtie version:     2.2.9.0
[2016-11-04 09:02:48] Checking for Bowtie index files (genome)..

[2016-11-04 09:02:48] Checking for reference FASTA file

[2016-11-04 09:02:48] Generating SAM header for reference_files/GRCh38.no_alt_analysis_set.fna.bowtie_index

Traceback (most recent call last):

File "/gpfs/runtime/opt/tophat/2.1.1/bin/tophat", line 4107, in <module>

sys.exit(main())

  File "/gpfs/runtime/opt/tophat/2.1.1/bin/tophat", line 3961, in main

params.read_params = check_reads_format(params, reads_list)

  File "/gpfs/runtime/opt/tophat/2.1.1/bin/tophat", line 1856, in check_reads_format

zf = ZReader(f_name, params)

  File "/gpfs/runtime/opt/tophat/2.1.1/bin/tophat", line 1809, in __init__

self.file=open(filename)

IOError: [Errno 2] No such file or directory: 'reference_files/GRCh38'
ADD REPLY
0
Entering edit mode

I copied and pasted the command from your original post since I thought you had the directory/file paths correct. The command as written assumes that you are running it in a directory which contains a subdirectory called reference_files. Is that not the case? Adjust the file paths as they exist on your machine relative to where ever you are running this command from.

Generically do this. /path_to/ part should be replaced with a real file path on your computer.

tophat -p4 -G /path_to/Homo_sapiens.GRCh38.86.gtf -o tophat_results/ /path_to/GRCh38.no_alt_analysis_set.fna.bowtie_index /path_to/S_01_L003_R1.fastq

If you have pre-built the transcriptome index (with a special run of TopHat with -G option) then you could replace that index in place of the whole genome one (like in the example above). You will no longer need to add -G option and would only be mapping to the "known" transcriptome in this case.

 tophat -p4 -o tophat_results/  --transcriptome-index=/path_to/transcriptome_data/known /path_to/S_01_L003_R1.fastq
ADD REPLY
1
Entering edit mode
8.1 years ago
Jeffin Rockey ★ 1.3k

Most probably the issue is because the GRCh38 fasta being used was indexed by another bowtie version, to be more precise, by an earlier bowtie2 version.

Earlier bowtie version like 2.0. series used to give bt2 indices for GRCh38 not bt2l. But if you index the same GRCh38 genome by Bowtie2 2.2.9 as used in your tophat at present, it will give .bt2l indices. Now, when you ran tophat, while verifying inputs, as per that size of the fasta file, the new version expects bt2l indices be present and consequently the error.

To summarise, for resolution, two easy ways.

1) For alignment,make tophat use the same bowtie2 version used for indexing.

2) Index the fasta using bowtie2 2.2.9 and rerun tophat.

Hope this helps.

ADD COMMENT
1
Entering edit mode
5.2 years ago
Arnaud Ceol ▴ 860

I had the same problem, bowtie2 (version 2.3.4.3) was building .bt2 index by default. I had to use the parameter --large-index to obtain the bt2l index.

ADD COMMENT
0
Entering edit mode

This option worked for me also.

ADD REPLY

Login before adding your answer.

Traffic: 2003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6