Question

Error running /usr/local/bin/tophat_reports

0

Entering edit mode

9.1 years ago

Cricket ▴ 10

Greetings:

I am currently getting an error:

    [2016-06-20 11:21:43] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2016-06-20 11:21:43] Checking for Bowtie
          Bowtie version:    2.2.4.0
[2016-06-20 11:21:43] Checking for Bowtie index files (transcriptome)..
[2016-06-20 11:21:43] Checking for Bowtie index files (genome)..
[2016-06-20 11:21:43] Checking for reference FASTA file
[2016-06-20 11:21:43] Generating SAM header for indices/NC_000913.3
[2016-06-20 11:21:43] Reading known junctions from GTF file
[2016-06-20 11:21:43] Preparing reads
     left reads: min. length=21, max. length=250, 817142 kept reads (73 discarded)
[2016-06-20 11:22:04] Using pre-built transcriptome data..
[2016-06-20 11:22:04] Mapping left_kept_reads to transcriptome NZ_CP015023 with Bowtie2 
[2016-06-20 11:23:24] Resuming TopHat pipeline with unmapped reads
[2016-06-20 11:23:24] Mapping left_kept_reads.m2g_um to genome NC_000913.3 with Bowtie2 
[2016-06-20 11:23:55] Mapping left_kept_reads.m2g_um_seg1 to genome NC_000913.3 with Bowtie2 (1/10)
[2016-06-20 11:24:01] Mapping left_kept_reads.m2g_um_seg2 to genome NC_000913.3 with Bowtie2 (2/10)
[2016-06-20 11:24:07] Mapping left_kept_reads.m2g_um_seg3 to genome NC_000913.3 with Bowtie2 (3/10)
[2016-06-20 11:24:12] Mapping left_kept_reads.m2g_um_seg4 to genome NC_000913.3 with Bowtie2 (4/10)
[2016-06-20 11:24:17] Mapping left_kept_reads.m2g_um_seg5 to genome NC_000913.3 with Bowtie2 (5/10)
[2016-06-20 11:24:19] Mapping left_kept_reads.m2g_um_seg6 to genome NC_000913.3 with Bowtie2 (6/10)
[2016-06-20 11:24:21] Mapping left_kept_reads.m2g_um_seg7 to genome NC_000913.3 with Bowtie2 (7/10)
[2016-06-20 11:24:22] Mapping left_kept_reads.m2g_um_seg8 to genome NC_000913.3 with Bowtie2 (8/10)
[2016-06-20 11:24:22] Mapping left_kept_reads.m2g_um_seg9 to genome NC_000913.3 with Bowtie2 (9/10)
[2016-06-20 11:24:22] Mapping left_kept_reads.m2g_um_seg10 to genome NC_000913.3 with Bowtie2 (10/10)
[2016-06-20 11:24:23] Searching for junctions via segment mapping
[2016-06-20 11:24:32] Retrieving sequences for splices
[2016-06-20 11:24:32] Indexing splices
Building a SMALL index
[2016-06-20 11:24:32] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/10)
[2016-06-20 11:24:34] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/10)
[2016-06-20 11:24:36] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/10)
[2016-06-20 11:24:37] Mapping left_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/10)
[2016-06-20 11:24:38] Mapping left_kept_reads.m2g_um_seg5 to genome segment_juncs with Bowtie2 (5/10)
[2016-06-20 11:24:39] Mapping left_kept_reads.m2g_um_seg6 to genome segment_juncs with Bowtie2 (6/10)
[2016-06-20 11:24:39] Mapping left_kept_reads.m2g_um_seg7 to genome segment_juncs with Bowtie2 (7/10)
[2016-06-20 11:24:39] Mapping left_kept_reads.m2g_um_seg8 to genome segment_juncs with Bowtie2 (8/10)
[2016-06-20 11:24:40] Mapping left_kept_reads.m2g_um_seg9 to genome segment_juncs with Bowtie2 (9/10)
[2016-06-20 11:24:40] Mapping left_kept_reads.m2g_um_seg10 to genome segment_juncs with Bowtie2 (10/10)
[2016-06-20 11:24:40] Joining segment hits
[2016-06-20 11:24:42] Reporting output tracks
    [FAILED]
Error running /usr/local/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir tophat_output_201c_1LBG/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip --gtf-annotations transcriptome_data_201c/NZ_CP015023.gff --gtf-juncs tophat_output_201c_1LBG/tmp/NZ_CP015023.juncs --no-closure-search --no-coverage-search --no-microexon-search --sam-header tophat_output_201c_1LBG/tmp/NC_000913.3_genome.bwt.samheader.sam --report-mixed-alignments --samtools=/usr/local/bin/samtools_0.1.18 --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 indices/NC_000913.3.fa tophat_output_201c_1LBG/junctions.bed tophat_output_201c_1LBG/insertions.bed tophat_output_201c_1LBG/deletions.bed tophat_output_201c_1LBG/fusions.out tophat_output_201c_1LBG/tmp/accepted_hits tophat_output_201c_1LBG/tmp/left_kept_reads.m2g.bam,tophat_output_201c_1LBG/tmp/left_kept_reads.m2g_um.mapped.bam,tophat_output_201c_1LBG/tmp/left_kept_reads.m2g_um.candidates.bam tophat_output_201c_1LBG/tmp/left_kept_reads.bam
Loaded 24 junctions

This error has been reported already: tophat reporting output problem, Tophat_Reports "Failed" Error, Tophat2 Reporting output tracks failed, Tophat2 output error, What causes tophat_reports error?

The responses have varied from an inability to set the number of processors >1 to insufficient memory -- none of which seem to be my issue -- with the exception of one post that says This error can occur when none of your reads map (https://www.biostars.oonerg/p/131271/).

DESIGN & WHAT I WOULD LIKE TO DO:

The experiment involved two E. coli strains in two different environmental/experimental conditions (3 replicates per strain - per condition). I already compared the differences in each E. coli strain between the two different experimental conditions but now I need to compare 2 different strains of E. coli (same environmental condition) to each other. This approach may not be copacetic -- but if you would like to weigh in on that, you can do so here: C: RNA-seq: Comparing 2 strains (same experimental conditions)

These are single, not paired-end reads.

In order to do this, I have decided to map the two strains to the K-12 genome (NC_000913.3). I have the gff file and the fasta file for this reference genome.

CODE:

bowtie2-build -f indices/NC_000913.3.fa indices/NC_000913.3

tophat2 --output-dir tophat_output_201c_1LBG -p 8 --GTF indices/NC_000913.3.gff3 --transcriptome-index=transcriptome_data_201c/NZ_CP015023 indices/NC_000913.3 ../data/rna_seq_files/201_LBG01_F.fastq

TROUBLESHOOTING:

I have changed the number of processors (1, 8, 30) and I get the same error
I have tried adding the --no-discordant option even though I am not dealing with paired end reads
I watched how much memory was used, and at max (there was 3627 out of 129046 MB)
I have checked the log files. Perhaps I don't know what to look for the few items that are there were not informative for finding the solution to this error.

QUESTIONS

I am concerned that the error is being caused by trying to map the strain to the reference genome, but I cannot verify this. Can anybody verify that the calls I am making are proper for what I would like to do and/or suggest something other than what has already been listed? Also, can someone tell me how I can determine whether any actual mapping occurred (though it looks like mapping has occurred)?

ADDITIONAL INFORMATION CentOS 6.8 TopHat v2.1.0 Bowtie version: 2.2.4.0

Thank you.

RNA-Seq • 3.1k views

ADD COMMENT • link 9.1 years ago by Cricket ▴ 10

0

Entering edit mode

Not answering your question directly but do you need to use tophat since this is bacterial data (no splicing)? You could use any aligner (I recommend BBMap) and then use featureCounts to generate the count matrix.

ADD REPLY • link 9.1 years ago by GenoMax 153k

0

Entering edit mode

Thank you for the response genomax2. ...I would greatly prefer using tophat since I would like to do differential analysis using the Tophat output with Cufflinks (I already have the pipelines set up from my previous work), but if I cannot figure out to solve my current problem, I may try out BBMap.

ADD REPLY • link 9.1 years ago by Cricket ▴ 10