When I used a UCSC gtf file and fasta file (genome.fa) to run tophat2, I was able to run it without any problems.
However, when I used a NONCODE mouse mm10 gtf file and NONCODE mouse fa file downloaded from http://www.noncode.org/download.php, I got “Error: Couldn't build bowtie index with err = 1” as follows:
[2016-06-30 00:49:29] Beginning TopHat run (v2.0.10)
-----------------------------------------------
[2016-06-30 00:49:29] Checking for Bowtie
Bowtie version: 2.1.0.0
[2016-06-30 00:49:29] Checking for Samtools
Samtools version: 0.1.18.0
[2016-06-30 00:49:29] Checking for Bowtie index files (genome)..
[2016-06-30 00:49:29] Checking for reference FASTA file
[2016-06-30 00:49:29] Generating SAM header for Bowtie2Index/genome
[2016-06-30 00:49:32] Reading known junctions from GTF file
[2016-06-30 00:49:35] Preparing reads
left reads: min. length=50, max. length=50, 29800718 kept reads (16597 discarded)
[2016-06-30 00:54:48] Building transcriptome data files ./result/346_mm10/tmp/NONCODE2016_mouse_mm10_lncRNA
[2016-06-30 00:54:51] Building Bowtie index from NONCODE2016_mouse_mm10_lncRNA.fa
[FAILED]
Error: Couldn't build bowtie index with err = 1
I renamed the NONCODE mouse fa file as genome.fa in the Bowtie2Index/ directory and kept the other 7 existing files (all have base name “genome”: genome.1.bt2, genome.3.bt2, genome.rev.1.bt2, genome.2.bt2, genome.4.bt2, genome.fa.fai, genome.rev.2.bt2) from the previous UCSC Bowtie2Index/ directory. I am wondering if this may cause any problems?
I also tried (in the Bowtie2Index/ directory):
bowtie-inspect -n genome
and got the following error message:
Could not locate a Bowtie index corresponding to basename "genome"
I also tried this command in the original UCSC Bowtie2Index/ directory but got the same above error message even though tophat2 runs fine with the UCSC data.
I would really appreciate any solution for this problem.
Thank you very much!
Thank you so much for your advice! I am now trying to create a set of new index files with the NONCODE file using bowtie2-build as follows:
Where genome.fa is the NONCODE2016_mouse.fa I downloaded from http://www.noncode.org/download.php. I was able to successfully create six index files (genome.2.bt2, genome.4.bt2, genome.rev.1.bt2, genome.1.bt2, genome.3.bt2, genome.rev.2.bt2). However, when I rerun tophat2 with these new index files, I still get: “Error: Couldn't build bowtie2 index with err = 1” at the "Building Bowtie index" step.
When I used:
the output I got was very large, and the following is a small portion:
It doesn't seem to match the first column of the NONCODE mouse mm10 gtf file, rather it seems to match the transcript_id instead. The following is a small portion of my gtf file:
I am wondering if I produced the bowtie2 index files correctly? Do I need to add some command line options to bowtie2-build, or do I need to do some pre-processing with the original fasta file? I would really appreciate your help.
Thank you very much!
First of all are you doing these steps in a new directory starting with just the NONCODE genome file in there. It is a bad idea to mix datasets.
If I am looking at the right files then the
NONCODE2016_mouse.fa
file contains the sequence of just the non-coding part whereas theNONCODE2016_mouse_mm10_lncRNA.gtf
contains annotation data that references the entire genome. So these two are not going to work together.I think you are going to need to make a transcriptome specific TopHat index with just the GTF file and mm10 genome. There is a section in tophat manual about that (look for the -G option). There are threads on biostars about this as well.
Thank you very much for the suggestion!
I created the bowtie2 index files in a new directory starting with just the NONCODE genome file inside, so the dataset should remain separate.
As you suggested, I tried to make a transcriptome specific TopHat index using the following command (based on the TopHat manual):
Where genes.gtf is NONCODE2016_mouse_mm10_lncRNA.gtf and Bowtie2Index/genome is the base name of the bowtie2 index files created from NONCODE2016_mouse.fa. However, I still get the following error when I run the above command:
Since you mentioned that NONCODE2016_mouse.fa only contains the non-coding portion, I'm wondering whether it is sufficient to use the bowtie2 index files created by NONCODE2016_mouse.fa to make a transcriptome specific TopHat index?
What also worries me is that the output of the "bowtie2-inspect -n genome" command gave only the transcript_id's rather than the chromosomes. I'm not sure if this may be a problem?
Thank you very much for your help!