Question

TopHat Failed In 'Reporting output tracks' Stage

1

Entering edit mode

8.0 years ago

ivan.molodtsov ▴ 10

Hello everyone!

I've encountered the following problem while doing study on EncodeProject data, maybe someone here would be able to give me an advice!

I've downloaded raw sequencing data for library ENCLB555APY from https://www.encodeproject.org/experiments/ENCSR000CPY/ and tried to map it on human genome downloaded from ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Homo_sapiens/UCSC/hg38/Homo_sapiens_UCSC_hg38.tar.gz.

Afterwards I've tried to use TopHat v2.1.0 as following

tophat2 -p 8 --b2-very-sensitive -o tophat_res/ENCLB555APY/ Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/genome  ENCFF000HGG.fastq.gz ENCFF000HHF.fastq.gz

and it failed with the following tophat.log with Error which I failed to Google:

[2016-12-03 17:27:10] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2016-12-03 17:27:10] Checking for Bowtie
                  Bowtie version:        2.2.6.0
[2016-12-03 17:27:10] Checking for Bowtie index files (genome)..
[2016-12-03 17:27:10] Checking for reference FASTA file
[2016-12-03 17:27:10] Generating SAM header for Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/genome
[2016-12-03 17:27:12] Preparing reads
         left reads: min. length=76, max. length=76, 133266131 kept reads (283399 discarded)
        right reads: min. length=76, max. length=76, 133088200 kept reads (461330 discarded)
[2016-12-03 18:28:22] Mapping left_kept_reads to genome genome with Bowtie2
[2016-12-04 03:31:38] Mapping left_kept_reads_seg1 to genome genome with Bowtie2 (1/3)
[2016-12-04 03:42:39] Mapping left_kept_reads_seg2 to genome genome with Bowtie2 (2/3)
[2016-12-04 03:53:31] Mapping left_kept_reads_seg3 to genome genome with Bowtie2 (3/3)
[2016-12-04 04:07:24] Mapping right_kept_reads to genome genome with Bowtie2
[2016-12-04 12:42:21] Mapping right_kept_reads_seg1 to genome genome with Bowtie2 (1/3)
[2016-12-04 13:04:21] Mapping right_kept_reads_seg2 to genome genome with Bowtie2 (2/3)
[2016-12-04 13:24:26] Mapping right_kept_reads_seg3 to genome genome with Bowtie2 (3/3)
[2016-12-04 13:48:17] Searching for junctions via segment mapping
[2016-12-04 14:04:52] Retrieving sequences for splices
[2016-12-04 14:06:20] Indexing splices
[2016-12-04 14:06:53] Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/3)
[2016-12-04 14:10:07] Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/3)
[2016-12-04 14:13:25] Mapping left_kept_reads_seg3 to genome segment_juncs with Bowtie2 (3/3)
[2016-12-04 14:16:18] Joining segment hits
[2016-12-04 14:20:55] Mapping right_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/3)
[2016-12-04 14:29:53] Mapping right_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/3)
[2016-12-04 14:36:11] Mapping right_kept_reads_seg3 to genome segment_juncs with Bowtie2 (3/3)
[2016-12-04 14:40:37] Joining segment hits
[2016-12-04 14:47:37] Reporting output tracks
        [FAILED]
Error running /usr/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir tophat_res/ENCLB555APY// --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --no-closure-search --no-coverage-search --no-microexon-search --sam-header tophat_res/ENCLB555APY//tmp/genome_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/usr/bin/samtools_0.1.18 --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/genome.fa tophat_res/ENCLB555APY//junctions.bed tophat_res/ENCLB555APY//insertions.bed tophat_res/ENCLB555APY//deletions.bed tophat_res/ENCLB555APY//fusions.out tophat_res/ENCLB555APY//tmp/accepted_hits tophat_res/ENCLB555APY//tmp/left_kept_reads.mapped.bam,tophat_res/ENCLB555APY//tmp/left_kept_reads.candidates tophat_res/ENCLB555APY//tmp/left_kept_reads.bam tophat_res/ENCLB555APY//tmp/right_kept_reads.mapped.bam,tophat_res/ENCLB555APY//tmp/right_kept_reads.candidates tophat_res/ENCLB555APY//tmp/right_kept_reads.bam
Error: failed to retrieve right read for pair # 2037377 !

It looks like some error in input files but I would think it to be highly improbable. So what have I done wrong and is there any way to overcome this error without re-running tophat?

Thanks in advance,

Ivan

RNA-Seq rna-seq software error TopHat TopHat2 • 3.4k views

ADD COMMENT • link updated 6.9 years ago by zuolin.bai • 0 • written 8.0 years ago by ivan.molodtsov ▴ 10

0

Entering edit mode

Hello Ivan,

I got the same error. Could you please let me know how to fix it or what is the cause?

It is appreciated.

Zuolin Bai

ADD REPLY • link 6.9 years ago by zuolin.bai • 0

0

Entering edit mode

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

Please stop using Tophat https://t.co/Es4ohxOEyx Cole and I developed the method in *2008*. It was greatly improved in TopHat2 then HISAT & HISAT2. There is no reason to use it anymore. I have been saying this for years yet it has more citations this year than last #methodsmatter
— Lior Pachter (@lpachter) December 2, 2017

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

score 1 · Answer 1 · 2016-12-04

1

Entering edit mode

8.0 years ago

mastal511 ★ 2.1k

I've had tophat fail at the 'Reporting output tracks' stage because it ran out of memory, but I didn't get the error message you're getting.

ADD COMMENT • link 8.0 years ago by mastal511 ★ 2.1k

0

Entering edit mode

Thanks for response! It was a machine with 32Gb RAM and app. 900 Gb of free storage space, so I wouldn't expect any kind of memory problems. The 'failed to retrieve right read for pair' mistake puzzles me as well

ADD REPLY • link 8.0 years ago by ivan.molodtsov ▴ 10

0

Entering edit mode

Actually 32 Gb is not that much, seeing that you have 133 million read pairs. Check how much memory it is using while it runs.

ADD REPLY • link 8.0 years ago by mastal511 ★ 2.1k