I am using Tophat in Galaxy (Galaxy version 0.9). In TopHat tool’s setup menu in Galaxy, there is an option: “Use a built in reference genome or own from your history”, and this option has two choices: “Use a built-in genome” and “Use a genome from history”. When I choose “Use a genome from history”, I can provide Tophat with my reference genome fasta file in my workflow.
However, this means that every time when Tophat runs, it first builds bowtie2 index files on-the-fly from the reference genome fasta file I provided. This slows down the process a lot, and also has “insufficient memory” problem (bowtie2-build aborted) when I run Tophat twice for two fastq samples in one workflow.
So I am wondering if there is a way to specify our pre-built bowtie2 index files when running Tophat in Galaxy, so that I can just use them in multiple Tophat runs in my workflow without having to build them on-the-fly in each Tophat run. I searched through the TopHat tool’s setup menu in Galaxy but could not find this kind of option.
So, is there a way to store our pre-built bowtie2 index files in database and make them “a built-in genome”? If so, what are the required steps for doing that?
Since I need to use different annotations for different workflows, I am also wondering if we can have multiple “built-in genome” for the same organism?
I would greatly appreciate any help on this problem.
Thank you very much!
Are you using public galaxy or a local mirror?
I don't think there is a way to do that in PSU galaxy (technically there is but they can't accommodate all users like you) but you could easily do it if you are using a local mirror. You would need to talk with your local galaxy admins and they can create and add as many genome indexes as you want so they become available in the drop-down menu.
Thank you so much for your advice!
I’m using a local mirror and glad to know we can add our genome indexes.
I’ll talk to our galaxy admin and give it a try.
Thank you very much!