Align short reads to large multi reference genome
1
1
Entering edit mode
9.4 years ago
Sandeep ▴ 260

I have been trying to align approximately 3 million short sequences (17 - 35 nucleotide long) to a multi fasta file of prokaryotic genomes. The size of my reference fasta is about 10 GB in size. I have tried using bowtie to create the index file and the extension for the same is .ebwtl (large index).

Now, when I try to align using the command

bowtie combined -p 12 -l 17 -a -m5 --best ../seq.fastq -v 2 -S test.sam

Where, combined being the name of the index, I get an error Could not locate a Bowtie index corresponding to basename "combined"

I have run the same using blastn option with task -short and it took me around 5 days to finish the task. I have also tried it with SHRiMP aligner, and even that throws an error.

I have many such query files and its not feasible to wait for around 5 days to obtain the result. Also, I looked into this tool called MALT and it does not have an option to align short queries, here the default evalue is 50.

Any suggestions to get bowtie working for the index I have?

PS: I used cat command to build the multi fasta reference file.

bowtie ebwtl • 4.3k views
ADD COMMENT
0
Entering edit mode

What does your bowtie2-build command look like? (Are you really using Bowtie, or are you using bowtie2?) The multi-fasta part should not matter because most references are multi-fasta, whether it be a genome, transcriptome, or something else. So I think it would either be the building of the index or maybe you specified a path in which you stored the combined index? It could also be that there is a problem with the cat cmd you used to create the multi-fasta.

ADD REPLY
0
Entering edit mode

I used bowtie, not bowtie2. I build it twice once using the command

bowtie-build combined.fasta combined

and also using

bowtie-build --large-index combined.fasta combined

Both indexes give the same error. Also, note that the same fasta file when used with BLAST gives the output, but the time taken is really long.

ADD REPLY
0
Entering edit mode

Owhkee, the weird thing is it says that it cannot locate the index so I was expecting there to be something wrong with the path or a spelling error something... but if you did this in the same folder that is indeed strange :S

I'm sorry I can't help, I only use Bowtie2 and when trying your cmd line input (albeit with a smaller reference and only 2 read files) with Bowtie2 I don't get an error.

ADD REPLY
0
Entering edit mode

When the fasta sequence is ~ 4 -5 GB bowtie index extension remains .ebtw and it works fine. Upon adding more reference and the extension becomes .ebtwl it fails :(. The problem seems to be with the large index.

ADD REPLY
0
Entering edit mode

Make sure that the version of bowtie-build and bowtie are the same. Only the most recent version(s?) of bowtie are supposed to support large indexes, so if you happen to be using two different versions then that's likely the problem. In general, I think most people using large indices are using bowtie2, so you might have better luck with that, since it's more likely to work.

ADD REPLY
0
Entering edit mode

bowtie is installed in the PATH. The commands which bowtie, which bowtie-build give the same version output (bowtie-1.1.1).

The reason I did not use bowtie2 is because my query sequences are small ~20 nt long and bowtie2 is supposed to be for longer read lengths. I am currently downloading the latest version of bowtie 1.1.2 and will check after building the index. It takes about 10 - 20 hours to build the index for 10 GB reference.

ADD REPLY
0
Entering edit mode

There's no need to rebuild the index. Just ensure that you're in the same directory as the indices.

ADD REPLY
0
Entering edit mode

yes I am in the same directory.

ADD REPLY
0
Entering edit mode

I did that and still the same. I even tried using the latest version of bowtie and build index using it. Same problem.

ADD REPLY
1
Entering edit mode
9.2 years ago
KatjaS ▴ 10

When aligning files to a large genome (.ebwtl index extension), specify additional option --large-index. It seems there is a bug in Bowtie, that is why it does not recognise large genome index.

ADD COMMENT

Login before adding your answer.

Traffic: 1620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6