I'm very new at working with TopHat and Bowtie. I'm trying to align a set of paired-end RNA-seq reads onto a reference genome. According to tophat.log, left_kept_reads and right_kept_reads are being successfully mapped to the transcriptome. However, when the TopHat pipeline resumes with the unmapped reads, and Bowtie2 tries to map left_kept_reads.m2g_um to the reference genome, it logs an message: "[bam_header_read] EOF marker is absent. The input is probably truncated." A minute or so later, the process throws an error and terminates.
When I examine the files left_kept_reads.m2g_um.bam and right_kept_reads.mg2_um.bam, I find that both of them are missing the 28-byte block at the end that samtools recognizes as EOF for a .bam file. I assume that's what is causing the program to crash, but I don't know why the EOF block isn't being added, or what I can do about it.
The tophat commands I'm running are:
module load samtools/1.8
module load boost/1.66.0_gcc5+
module load bowtie2/2.3.2
module load tophat/2.0.13
tophat -r 200 -G /project/bf528/project_2/reference/annot/mm9.gtf --segment-length=20 --segment-mismatches=1 --no-novel-juncs -o P0_2_tophat -p 16 /project/bf528/project_2/reference/mm9 P0_1_1.fastq P0_1_2.fastq
The tophat.log file shows:
- [2019-06-05 13:12:56] Beginning TopHat run (v2.0.13)
- [2019-06-05 13:12:56] Checking for Bowtie Bowtie version: 2.3.2.0
- [2019-06-05 13:12:56] Checking for Bowtie index files (genome)..
- [2019-06-05 13:12:56] Checking for reference FASTA file
- [2019-06-05 13:12:56] Generating SAM header for /project/bf528/project_2/reference/mm9
- [2019-06-05 13:12:58] Reading known junctions from GTF file
- [2019-06-05 13:13:05] Preparing reads left reads: min. length=40, max. length=40, 21561496 kept reads (16066 discarded) right reads: min. length=40, max. length=40, 21347948 kept reads (229614 discarded)
- [2019-06-05 13:18:56] Building transcriptome data files P0_2_tophat/tmp/mm9
- [2019-06-05 13:19:13] Building Bowtie index from mm9.fa
- [2019-06-05 13:31:32] Mapping left_kept_reads to transcriptome mm9 with Bowtie2
- [2019-06-05 13:40:33] Mapping right_kept_reads to transcriptome mm9 with Bowtie2
- [2019-06-05 13:49:22] Resuming TopHat pipeline with unmapped reads
- [2019-06-05 13:49:22] Mapping left_kept_reads.m2g_um to genome mm9 with Bowtie2
- [bam_header_read] EOF marker is absent. The input is probably truncated.
- [2019-06-05 13:49:36] Retrieving sequences for splices
- [2019-06-05 13:50:42] Indexing splices [FAILED]
- Error: Splice sequence indexing failed with err =1
Thanks in advance!
Unless there is a dire need for
tophat
, use a current aligner such asSTAR
where possible.Before anything, I must state I agree with genomax and think you should consider a more recent RNAseq aligner.
Maybe the problem is with the SAMtools you are loading. From the release notes:
I would believe TopHat2 would preferentially use the bundled SAMtools, but you may try without
module load samtools/1.8
and see if this helps.Another thing to consider is an incompatibility between the particular TopHat version (from 2014) and Bowtie2 version (from 2017). You could try updating to the latest versions of both tools.