I have been trying to align approximately 3 million short sequences (17 - 35 nucleotide long) to a multi fasta file of prokaryotic genomes. The size of my reference fasta is about 10 GB in size. I have tried using bowtie to create the index file and the extension for the same is .ebwtl (large index).
Now, when I try to align using the command
bowtie combined -p 12 -l 17 -a -m5 --best ../seq.fastq -v 2 -S test.sam
Where, combined being the name of the index, I get an error Could not locate a Bowtie index corresponding to basename "combined"
I have run the same using blastn option with task -short and it took me around 5 days to finish the task. I have also tried it with SHRiMP aligner, and even that throws an error.
I have many such query files and its not feasible to wait for around 5 days to obtain the result. Also, I looked into this tool called MALT and it does not have an option to align short queries, here the default evalue is 50.
Any suggestions to get bowtie working for the index I have?
PS: I used cat command to build the multi fasta reference file.
What does your bowtie2-build command look like? (Are you really using Bowtie, or are you using bowtie2?) The multi-fasta part should not matter because most references are multi-fasta, whether it be a genome, transcriptome, or something else. So I think it would either be the building of the index or maybe you specified a path in which you stored the combined index? It could also be that there is a problem with the cat cmd you used to create the multi-fasta.
I used bowtie, not bowtie2. I build it twice once using the command
and also using
Both indexes give the same error. Also, note that the same fasta file when used with BLAST gives the output, but the time taken is really long.
Owhkee, the weird thing is it says that it cannot locate the index so I was expecting there to be something wrong with the path or a spelling error something... but if you did this in the same folder that is indeed strange :S
I'm sorry I can't help, I only use Bowtie2 and when trying your cmd line input (albeit with a smaller reference and only 2 read files) with Bowtie2 I don't get an error.
When the fasta sequence is ~ 4 -5 GB bowtie index extension remains .ebtw and it works fine. Upon adding more reference and the extension becomes .ebtwl it fails :(. The problem seems to be with the large index.
Make sure that the version of bowtie-build and bowtie are the same. Only the most recent version(s?) of bowtie are supposed to support large indexes, so if you happen to be using two different versions then that's likely the problem. In general, I think most people using large indices are using bowtie2, so you might have better luck with that, since it's more likely to work.
bowtie is installed in the PATH. The commands
which bowtie
,which bowtie-build
give the same version output (bowtie-1.1.1).The reason I did not use bowtie2 is because my query sequences are small ~20 nt long and bowtie2 is supposed to be for longer read lengths. I am currently downloading the latest version of bowtie 1.1.2 and will check after building the index. It takes about 10 - 20 hours to build the index for 10 GB reference.
There's no need to rebuild the index. Just ensure that you're in the same directory as the indices.
yes I am in the same directory.
I did that and still the same. I even tried using the latest version of bowtie and build index using it. Same problem.