Can I Pre-Build The Shared Transcriptome Index For A Series Of Tophat Runs?
3
0
Entering edit mode
12.5 years ago
Ryan Thompson ★ 3.6k

I would like to use Tophat's ability to share a pre-built transcriptome index (the --transcriptome-index option). However, I am going to start several Tophat runs in parallel on a cluster, and I worry the first few runs to start will all try to build the transcriptome index and possibly all try to save it to the shared location at the same time, causing problems. Is it possible to have Tophat pre-build the transcriptome index without actually doing any mapping, so that I can have the index already built before launching any of my mapping jobs? Could I implement this by simply running Tophat with an empty fastq file and throwing away the result?

tophat rna-seq • 5.3k views
ADD COMMENT
2
Entering edit mode
12.5 years ago
Ryan Thompson ★ 3.6k

I watched what Tophat does to create the index, and I came up with the following script to do the same thing:

#!/bin/bash

ANNOTATION="$1"
GENOME="$2"
INDEX_BASE="$3"


if [ -f $INDEX_BASE.fa ]; then
  echo "Index already built at $INDEX_BASE"
  exit 0
else
  echo "Building transcriptome fasta file at $INDEX_BASE.fa for $ANNOTATION:($GENOME.fa)"
  gtf_to_fasta $ANNOTATION $GENOME.fa $INDEX_BASE.fa || {
    echo "gtf_to_fasta failed with exit code $?"
    exit 1
  }
  if [ -f $INDEX_BASE.1.ebwt ]; then
    echo "Forcing rebuild of Bowtie 1 index"
    rm -f $INDEX_BASE.*.ebwt
  fi
  if [ -f $INDEX_BASE.1.bt2 ]; then
    echo "Forcing rebuild of Bowtie 2 index"
    rm -f $INDEX_BASE.*.bt2
  fi
fi

if [ -f $INDEX_BASE.1.ebwt ]; then
  echo "Already built Bowtie 1 index"
else
  echo "Building Bowtie 1 index"
  bowtie-build $INDEX_BASE.fa $INDEX_BASE
fi

if [ -f $INDEX_BASE.1.bt2 ]; then
  echo "Already built Bowtie 2 index"
else
  echo "Building Bowtie 2 index"
  bowtie2-build $INDEX_BASE.fa $INDEX_BASE
fi
ADD COMMENT
0
Entering edit mode
12.5 years ago

If your empty fastq file doesn't work, how about you just use a fastq file with one record in it.

Once the transcriptome index is built, move it to a suitable place so that the nodes in your cluster can see it and I think you should be good.

ADD COMMENT
0
Entering edit mode
10.0 years ago

Create the transcritome index for the first tophat run. For the next tophat run, the program will use the same index.

ADD COMMENT
1
Entering edit mode

Not a great idea cuz OP does not wanna spend the time taken to run tophat for just creating the index. Also, 2.5 years old thread :)

ADD REPLY

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6