Question

Unable to find Bowtie2 index

0

Entering edit mode

8.5 years ago

EpiExplorer ▴ 90

Hi,

I am unable to run Tophat2 as I get an error. Here is the command I run:

tophat2  -p 5  -r 62 –library-type fr-firststrand  -G  /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/gene.gtf –o /home/jmotwani/RNASeq/Alignment_Tophat2 --BOWTIE2_INDEXES  /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/   C95VLANXX-2046D-01-01-01_L003_R1_Trimmed.fastq C95VLANXX-2046D-01-01-01_L003_R2_Trimmed.fastq

I get the following error after I run the above command:

[2016-05-22 22:20:05] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2016-05-22 22:20:05] Checking for Bowtie
          Bowtie version:    2.2.9.0
[2016-05-22 22:20:05] Checking for Bowtie index files (genome)..
    Error: Could not find Bowtie 2 index files (–library-type.*.bt2l)

The indexed genome was downloaded from Illumina. Do I have to build it after downloading it? I downloaded the genome, gtf file, and indexed files and gave the path of those files in the command above.

Could anyone please comment or advise on this.

Thanks for your time.

Regards,
J

RNA-Seq • 9.1k views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 8.5 years ago by EpiExplorer ▴ 90

0

Entering edit mode

Thanks Goutham and WouterDeCoster. I tried but still get the same error.

Not sure if the files provided by iGenomes for Bowtie2Index are incompatible with latest version of tophat.

The files I have in the Bowtie2Index are:

genome.1.bt2 genome.2.bt2 genome.3.bt2 genome.4.bt2
genome.fa genome.rev.1.bt2
genome.rev.2.bt2 tophat_out

ADD REPLY • link 8.5 years ago by EpiExplorer ▴ 90

0

Entering edit mode

I guess the problem is typo. -library-type instead of --library-type. The error says –library-type.*.bt2l

ADD REPLY • link 8.5 years ago by GouthamAtla 12k

0

Entering edit mode

Thanks but still getting the same error. I realized that I missed '-' for library-type. I had written -library-type in place of --library-type. I changed it and ran the command again but I get a new error:

tophat: option -? not recognized for detailed help see http://ccb.jhu.edu/software/tophat/manual.shtml

ADD REPLY • link 8.5 years ago by EpiExplorer ▴ 90

0

Entering edit mode

Could you share the command that you ran? That looks like a different problem than before. Where are you indexes? Did you point the environmental variable to it's location?

Note, from the manual:

The basename of the genome index to be searched. The basename is the name of any of the index files up to but not including the first period. Bowtie first looks in the current directory for the index files, then looks in the indexes subdirectory under the directory where the currently-running bowtie executable is located, then looks in the directory specified in the BOWTIE_INDEXES (or BOWTIE2_INDEXES) environment variable. Please note that it is highly recommended that a FASTA file with the sequence(s) the genome being indexed be present in the same directory with the Bowtie index files and having the name <genome_index_base>.fa. If not present, TopHat will automatically rebuild this FASTA file from the Bowtie index files.

ADD REPLY • link 8.5 years ago by WouterDeCoster 47k

0

Entering edit mode

Here is the command:

tophat2 -p 5 -r 62 --library-type fr-firststrand -G /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/gene.gtf –o /home/jmotwani/RNASeq/Alignment_Tophat2 /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome read1.fastq read2.fastq

The indexes are in the location specified in the command. I have not created an environmental variable but have given the whole path instead in the command.

ADD REPLY • link 8.5 years ago by EpiExplorer ▴ 90

0

Entering edit mode

Have you read what I posted from the manual?

The basename of the genome index to be searched. The basename is the name of any of the index files up to but not including the first period. Bowtie first looks in the current directory for the index files, then looks in the indexes subdirectory under the directory where the currently-running bowtie executable is located, then looks in the directory specified in the BOWTIE_INDEXES (or BOWTIE2_INDEXES) environment variable. Please note that it is highly recommended that a FASTA file with the sequence(s) the genome being indexed be present in the same directory with the Bowtie index files and having the name <genome_index_base>.fa. If not present, TopHat will automatically rebuild this FASTA file from the Bowtie index files.

So this means that you have to specify only the basename of the indexes, that tophat will search in

current directory
in the indexes subdirectory with the bowtie executable
in the direction specified by the environmental variable

I'm not sure if it's a copy paste issue, but I can see in your post here that you have different kinds of '-': "- and –" This can happen when copy pasting commands from e.g Microsoft Word

ADD REPLY • link 8.5 years ago by WouterDeCoster 47k

0

Entering edit mode

I think I have managed to get rid of the previous error but now it throws new error!!!

tophat2 --num-threads 5 --mate-inner-dist 62 --library-type fr-firststrand --GTF /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf --output-dir /home/jmotwani/RNASeq/Alignment_Tophat2 /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome read1.fastq read2.fastq

Output:

[2016-05-23 22:07:53] Beginning TopHat run (v2.1.1)

[2016-05-23 22:07:53] Checking for Bowtie Bowtie version: 2.2.9.0 [2016-05-23 22:07:53] Checking for Bowtie index files (genome).. [2016-05-23 22:07:53] Checking for reference FASTA file [2016-05-23 22:07:53] Generating SAM header for /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome [2016-05-23 22:07:55] Reading known junctions from GTF file [2016-05-23 22:08:28] Preparing reads [FAILED] Error running 'prep_reads' Error: qual length (111) differs from seq length (106) for fastq record !

Has anyone come across this problem?

ADD REPLY • link 8.5 years ago by EpiExplorer ▴ 90

0

Entering edit mode

This tells you that your fastq file is corrupt since the quality string is longer than the sequence. What did you use for trimming the reads? How have you modified/processed your fastq data?

ADD REPLY • link 8.5 years ago by WouterDeCoster 47k

0

Entering edit mode

I used a in-house script (cleanadaptors) to trim the raw fastq files. I run the command to trim the data in the following way:

cleanadaptors -I /home/jmotwani/RNASeq/contam.fa -q 20 -x 25 -F C95VLANXX-2046D-01-01-01_L003_R1.fastq -o C95VLANXX-2046D-01-01-01_L003_R1_Trimmed.fastq -G C95VLANXX-2046D-01-01-01_L003_R2.fastq -O C95VLANXX-2046D-01-01-01_L003_R2_trimmed.fastq

-q is for quality and -x is for min length of the read

ADD REPLY • link 8.5 years ago by EpiExplorer ▴ 90

0

Entering edit mode

I can't help you with an in-house script, obviously.

Please use the comment options correctly to enable threading of questions and replies.

ADD REPLY • link 8.5 years ago by WouterDeCoster 47k

score 1 · Answer 1 · 2016-05-22

1

Entering edit mode

8.5 years ago

GouthamAtla 12k

--BOWTIE2_INDEXES does not exists. So you need not to mention that. Just give the base name of bowtie2 index.

 tophat2  -p 5  -r 62 –library-type fr-firststrand  \
-G /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/gene.gtf \
-o /home/jmotwani/RNASeq/Alignment_Tophat2  \
/home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome   \
C95VLANXX-2046D-01-01-01_L003_R1_Trimmed.fastq C95VLANXX-2046D-01-01-01_L003_R2_Trimmed.fastq

ADD COMMENT • link 8.5 years ago by GouthamAtla 12k

0

Entering edit mode

If I'm not mistaken, BOWTIE2_INDEXES is an environmental variable in which tophat will search for the indexes.

ADD REPLY • link 8.5 years ago by WouterDeCoster 47k

0

Entering edit mode

Its an environmental variable but not an argument for tophat2 I guess.

ADD REPLY • link 8.5 years ago by GouthamAtla 12k

0

Entering edit mode

I agree :-) you indeed just specify the basename for tophat2

ADD REPLY • link 8.5 years ago by WouterDeCoster 47k

score 0 · Answer 2 · 2016-05-27

Thanks all tophat2 works fine now. The issue was not having double hyphen for library-type parameter.Also, tophat2 doesn't like it if the mixture of option types are given . By this I mean is that it accepts either all the options with single hypen or all with double hypen,if i give mixture of these two types it doesnt like it.