Question

Can I Pre-Build The Shared Transcriptome Index For A Series Of Tophat Runs?

0

Entering edit mode

13.2 years ago

Ryan Thompson ★ 3.7k

I would like to use Tophat's ability to share a pre-built transcriptome index (the --transcriptome-index option). However, I am going to start several Tophat runs in parallel on a cluster, and I worry the first few runs to start will all try to build the transcriptome index and possibly all try to save it to the shared location at the same time, causing problems. Is it possible to have Tophat pre-build the transcriptome index without actually doing any mapping, so that I can have the index already built before launching any of my mapping jobs? Could I implement this by simply running Tophat with an empty fastq file and throwing away the result?

tophat rna-seq • 5.6k views

ADD COMMENT • link updated 10.6 years ago by Anil Kesarwani ▴ 90 • written 13.2 years ago by Ryan Thompson ★ 3.7k

score 2 · Answer 1 · 2012-06-13

I watched what Tophat does to create the index, and I came up with the following script to do the same thing:

#!/bin/bash

ANNOTATION="$1"
GENOME="$2"
INDEX_BASE="$3"


if [ -f $INDEX_BASE.fa ]; then
  echo "Index already built at $INDEX_BASE"
  exit 0
else
  echo "Building transcriptome fasta file at $INDEX_BASE.fa for $ANNOTATION:($GENOME.fa)"
  gtf_to_fasta $ANNOTATION $GENOME.fa $INDEX_BASE.fa || {
    echo "gtf_to_fasta failed with exit code $?"
    exit 1
  }
  if [ -f $INDEX_BASE.1.ebwt ]; then
    echo "Forcing rebuild of Bowtie 1 index"
    rm -f $INDEX_BASE.*.ebwt
  fi
  if [ -f $INDEX_BASE.1.bt2 ]; then
    echo "Forcing rebuild of Bowtie 2 index"
    rm -f $INDEX_BASE.*.bt2
  fi
fi

if [ -f $INDEX_BASE.1.ebwt ]; then
  echo "Already built Bowtie 1 index"
else
  echo "Building Bowtie 1 index"
  bowtie-build $INDEX_BASE.fa $INDEX_BASE
fi

if [ -f $INDEX_BASE.1.bt2 ]; then
  echo "Already built Bowtie 2 index"
else
  echo "Building Bowtie 2 index"
  bowtie2-build $INDEX_BASE.fa $INDEX_BASE
fi

score 0 · Answer 2 · 2012-06-12

0

Entering edit mode

13.2 years ago

Steve Lianoglou 5.2k

If your empty fastq file doesn't work, how about you just use a fastq file with one record in it.

Once the transcriptome index is built, move it to a suitable place so that the nodes in your cluster can see it and I think you should be good.

ADD COMMENT • link 13.2 years ago by Steve Lianoglou 5.2k

score 0 · Answer 3 · 2014-12-23

0

Entering edit mode

10.6 years ago

Anil Kesarwani ▴ 90

Create the transcritome index for the first tophat run. For the next tophat run, the program will use the same index.

ADD COMMENT • link 10.6 years ago by Anil Kesarwani ▴ 90

1

Entering edit mode

Not a great idea cuz OP does not wanna spend the time taken to run tophat for just creating the index. Also, 2.5 years old thread :)

ADD REPLY • link 10.6 years ago by Ram 45k