Tophat error : Error: gtf_to_fasta returned an error.
3
0
Entering edit mode
9.9 years ago
nikhilvgbt • 0
tophat -p 12 -G genes.gtf -o outtop hg18 SRR364314_1_trimmed.fq SRR364314_2_trimmed.fq
[2015-01-17 18:19:02] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2015-01-17 18:19:02] Checking for Bowtie
          Bowtie version:     2.1.0.0
[2015-01-17 18:19:02] Checking for Samtools
        Samtools version:     0.1.19.0
[2015-01-17 18:19:02] Checking for Bowtie index files (genome)..
[2015-01-17 18:19:02] Checking for reference FASTA file
    Warning: Could not find FASTA file hg18.fa
[2015-01-17 18:19:02] Reconstituting reference FASTA file from Bowtie index
  Executing: /usr/bin/bowtie2-inspect hg18 > outtop/tmp/hg18.fa
[2015-01-17 18:27:41] Generating SAM header for hg18
    format:         fastq
    quality scale:     phred33 (default)
[2015-01-17 18:28:36] Reading known junctions from GTF file
[2015-01-17 18:31:12] Preparing reads

WARNING: read pairing issues detected (check prep_reads.log) !

     left reads: min. length=12, max. length=100, 40632564 kept reads (354189 discarded)
    right reads: min. length=12, max. length=100, 40910664 kept reads (324280 discarded)
Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places
[2015-01-17 20:16:52] Building transcriptome data files..
    [FAILED]
 Error: gtf_to_fasta returned an error.

Please tell how to solve this problem... I tried with two different refence file for homosapiens

Please could you give a link for proper reference and genes annotations files... because I have tried the NCBI and UCSC brouser.. but its giving me error

Tophat next-gen RNA-Seq • 9.9k views
ADD COMMENT
0
Entering edit mode

you should use the same version of fasta and gtf file from same source, doesn't matter you use them from either ucsc, ensembl or genecode.

ADD REPLY
0
Entering edit mode

Hi Manvendra Singhji ;

As I tried from various source... though the index names in both gtf and reference files are same but still its giving me error;

Could you please suggest me a proper data set for both gtf and reference genome file for homo-sapiens ....

ADD REPLY
0
Entering edit mode

You should probably give us which files you're using and the link you downloaded them from. That stands a better chance of getting help than flat out asking folks to give you links.

ADD REPLY
0
Entering edit mode

Hi RamRS...

Thanks for your reply..

Please give proper link of both gene's gtf file and homo-sapiens reference genome file

These are my two time trial error file links..please check it..

for 1st time Error :

Gene file as gtf: http://www.gencodegenes.org/releases/21.html

Reference file: from UCSC browser

For 2nd time 2nd time:

Gene file as gtf :- ftp://ftp.ensembl.org/pub/release-64/gtf/homo_sapiens/

Reference genome file:- ftp://ftp.ncbi.nlm.nih.gov/sra/reports/Assembly/GRCh37-HG19_CompleteGenomics_variant/

ADD REPLY
0
Entering edit mode

Like Manvendra mentioned, you might wanna select the same source for both files. Just naming them similarly does not address their source-based nuances.

ADD REPLY
0
Entering edit mode

Could you please give me the links for reference genome and gene annotation file????

ADD REPLY
1
Entering edit mode

Like I mentioned in my earlier comment, this request stands a very low chance of getting a productive response from an online community because it shows lack of willingness to put in the effort you need to put in.

ADD REPLY
0
Entering edit mode

Gr8... I will try RamRS ... you are ri8..... I was just asking where I can download the same format files as a reference and gene files... It's fine

ADD REPLY
0
Entering edit mode

That is not a great attitude, nikhilvgbt. A simple search on this site yields at least 2 posts where your questions have been addressed in detail. They also include statements on why we emphasize that you be guided only to a certain point and you seek answers beyond that point.

This article might help you interact in forums better: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202

EDIT: And here's something to get you started: How do experienced people look for full reference genomes?

ADD REPLY
0
Entering edit mode

Another alternative way is to download the data from igenome is use the linux command and Download it

The command Script goes like this!!


braveheart@BraveHeart-SVE15116ENW ~ $ ftp

ftp> open ussd-ftp.illumina.com
Connected to ussd-ftp.illumina.com.
220 EFT Server 6.4.1 Build 12.19.2011.1
Name ussd-ftp.illumina.com:braveheart): igenome
331 Password required for igenome.
Password:
230 Login OK. Proceed.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd Homo_sapiens/UCSC/
250 Folder changed to "/Homo_sapiens/UCSC".
ftp> ls -ltr
200 Command okay.
150 Opening ASCII mode data connection for file list.
dr-xr--r--   1 user     group           0 Jun  4  2014 hg18
dr-xr--r--   1 user     group           0 Jun 16  2014 hg19
226 Transfer complete. 122 bytes transferred. 122 bps.
ftp> cd hg19
250 Folder changed to "/Homo_sapiens/UCSC/hg19".
ftp> ls
200 Command okay.
150 Opening ASCII mode data connection for file list.
-r--r--r--   1 user     group 29666884240 Jun  4  2014 Homo_sapiens_UCSC_hg19.tar.gz
226 Transfer complete. 86 bytes transferred. 86 bps.
ftp> get Homo_sapiens_UCSC_hg19.tar.gz
local: Homo_sapiens_UCSC_hg19.tar.gz remote: Homo_sapiens_UCSC_hg19.tar.gz
200 Command okay.
150 Opening BINARY mode data connection for Homo_sapiens_UCSC_hg19.tar.gz.
ADD REPLY
0
Entering edit mode

Or, you could simply run

wget ftp://ussd-ftp.illumina.com/Homo_sapiens/UCSC/hg19/Homo_sapiens_UCSC_hg19.tar.gz

Anyway, please move this answer to a reply to my comment here: C: Tophat error : Error: gtf_to_fasta returned an error.

ADD REPLY
0
Entering edit mode

but regarding my main error still I am not able to solve the error which is following:

Error: gtf_to_fasta returned an error.

Could you please help me

ADD REPLY
1
Entering edit mode

If you are using builds from iGenome, you should not face any errors with tophat. The error implies that there is a problem with GTF format may be. Please make sure the file is correctly formatted and read tophat manual and find out what it expects In a GTF File.

ADD REPLY
0
Entering edit mode

And, move this answer to a reply to my comment here as well: Tophat error : Error: gtf_to_fasta returned an error.

ADD REPLY
1
Entering edit mode
9.9 years ago

My guess would be, When you do some preprocessing, you need to make sure the reads are in proper pairs. For e.g, if there is a pair like, Read_xxx_1and Read_xxx_2, due to preprocessing if the mate Read_xxx_2 is removed, then they lose the pairing order. You need to reorder them and try alignment. But as RamRS pointed out, you should really do lot of homework before posting a question here.

A simple google search resulted in

http://seqanswers.com/forums/showthread.php?t=39063

ADD COMMENT
0
Entering edit mode

Not even a lot of homework, Geek_y. Just enough so people can give you a productive answer in an optimum time-frame.

ADD REPLY
0
Entering edit mode

Let me add my solution to others who are trying for the Tophat software:

For homo-sapiens reference genome download the Index files from website of bowtie software http://bowtie-bio.sourceforge.net/bowtie2/index.shtml ...for this you can build the reference genome file in fasta format.

For gene annotation file for homo-sapiens use the link http://genome.ucsc.edu/cgi-bin/hgTables ... it will create file in .gtf extension.

and now you can try to run the Tophat.

Still I am not sure about the proper formats in both... my Tophat program is running ... hope I will get results without error..

and thanks Geek_y and RamRS for your replies ....Thank you

ADD REPLY
0
Entering edit mode

If you are using tophat, get annotations and genome from http://ccb.jhu.edu/software/tophat/igenomes.shtml, which are well formatted to use with tophat-cufflinks-cuffdiff pipeline and htseq-count,DeSeq/edgeR pipeline.

ADD REPLY
0
Entering edit mode

Since the data files memory is so high from this http://ccb.jhu.edu/software/tophat/igenomes.shtml .... and while downloading this file if once it get disturbed then you can not resume the download also... so is there another alternative way to download the same data file which is available in ignome?

ADD REPLY
0
Entering edit mode

Use curl or wget to download the file. If the download fails, it will resume where it failed. Or try Galaxy. That will save you a bit from upload/download troubles.

ADD REPLY
1
Entering edit mode
9.4 years ago

I have had to troubleshoot this today - this problem is also reported when an annotation in the GTF file contains a coordinate that extends beyond the length of the sequence of the genome file.

Now a reason for this can of course be that the GTF and sequence are from different builds but it is worth noting that not all mismatching builds will produce an error. The error will get triggered only if the sequence for the incorrect build is shorter.

ADD COMMENT
0
Entering edit mode
7.0 years ago

Hi, I had almost the same problem, and I have solved it.

Here is my error:

[2017-11-22 10:04:30] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2017-11-22 10:04:30] Checking for Bowtie
          Bowtie version:    2.2.4.0
[2017-11-22 10:04:30] Checking for Bowtie index files (genome)..
[2017-11-22 10:04:30] Checking for reference FASTA file
[2017-11-22 10:04:30] Generating SAM header for /project/meiyue/Selaginella_kraussiana/GenomeAssemblyWithNGSData/S_kraussiana.bowtie2.index/S_kraussiana
[2017-11-22 10:04:31] Reading known junctions from GTF file
    Warning: TopHat did not find any junctions in GTF file
[2017-11-22 10:04:31] Preparing reads
     left reads: min. length=20, max. length=101, 233624 kept reads (0 discarded)
    right reads: min. length=20, max. length=101, 233624 kept reads (0 discarded)
[2017-11-22 10:04:39] Building transcriptome data files ./tophat_out/tmp/S_kraussiana_genome.maker_v2.filter
    [FAILED]
 Error: gtf_to_fasta returned an error.

I extracted "exon" by

cat file.gtf |awk 'BEGIN{OFS="\t"}{if($3=="exon")print}' >file.exon.gtf

and solved my problem.

Hope to be helpful.

ADD COMMENT

Login before adding your answer.

Traffic: 2565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6