Question

tophat2 cannot find transcript file

0

Entering edit mode

8.5 years ago

ag1194 • 0

Hi, I am new at using tophat2, in order to do that I am using a paper as a reference. In the paper they give the options they used [-r 25 --coverage-search -G --library-type fr-firststrand ], but I am getting an error. So this is the command line I use:

$ tophat -r 25 --coverage-search -G --library-type fr-firststrand /my_index directory/bowtie2/mm9 mysample.fastq &> tophat.log

In my log file it says:

Error: cannot find transcript file --library-type

I assume the error could be due to using -G but not providing an annotation, however in the paper they didn't provide anything other than they used mm9 genome for mapping. I have read tophat manual but couldn't figure out the reason of my error. Can anybody help me on this? Thanks!!

RNA-Seq • 3.3k views

ADD COMMENT • link updated 8.5 years ago by WouterDeCoster 47k • written 8.5 years ago by ag1194 • 0

2

Entering edit mode

The -G option is used to created a "transcriptome" specific index from a whole genome index by providing a GTF/GFF file like this. This is a one time run. It allows one to re-use this index for subsequent runs for all samples for aligning to just that part of the genome.

tophat -G known_genes.gtf \
    --transcriptome-index=transcriptome_data/known \
    hg19

When you actually use this index you need to provide the location for it by using

tophat -o out_sample2 -p4 \
    --transcriptome-index=transcriptome_data/known \
    hg19 sample2_1.fq.z sample2_2.fq.z

ADD REPLY • link 8.5 years ago by GenoMax 147k

0

Entering edit mode

Thank you very much! So can I use any gtf file with known genes regardless of the project ? For example, http://useast.ensembl.org/info/data/ftp/index.html from here, can I use mouse GTF gene sets, after -G option?

ADD REPLY • link 8.5 years ago by ag1194 • 0

1

Entering edit mode

If you are trying to replicate the analysis in a paper then make sure you get it from the same location/for the same genome build. Otherwise your results would be different from what is in the paper.

ADD REPLY • link 8.5 years ago by GenoMax 147k

0

Entering edit mode

Thank you very much for the help. One last thing, in general can I use Ref-Seq for annotation, or it has to be a specific location o a chromosome?

ADD REPLY • link 8.5 years ago by ag1194 • 0

0

Entering edit mode

If I understand the question correctly

RefSeq annotations would be stand alone (though the accession numbers may be included in the GTF file you will use). So if you want to correlate gene names with Refseq ID's you should be able to do that.

ADD REPLY • link 8.5 years ago by GenoMax 147k

0

Entering edit mode

Thank you very much!

ADD REPLY • link 8.5 years ago by ag1194 • 0

score 1 · Answer 1 · 2016-06-02

1

Entering edit mode

8.5 years ago

WouterDeCoster 47k

As you can read when using

tophat --help

the -G flag requires a GTF file

-G/--GTF                       <filename>  (GTF/GFF with known transcripts)

ADD COMMENT • link 8.5 years ago by WouterDeCoster 47k