Hello everyone!
I've encountered the following problem while doing study on EncodeProject data, maybe someone here would be able to give me an advice!
I've downloaded raw sequencing data for library ENCLB555APY from https://www.encodeproject.org/experiments/ENCSR000CPY/ and tried to map it on human genome downloaded from ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Homo_sapiens/UCSC/hg38/Homo_sapiens_UCSC_hg38.tar.gz.
Afterwards I've tried to use TopHat v2.1.0 as following
tophat2 -p 8 --b2-very-sensitive -o tophat_res/ENCLB555APY/ Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/genome ENCFF000HGG.fastq.gz ENCFF000HHF.fastq.gz
and it failed with the following tophat.log with Error which I failed to Google:
[2016-12-03 17:27:10] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2016-12-03 17:27:10] Checking for Bowtie
Bowtie version: 2.2.6.0
[2016-12-03 17:27:10] Checking for Bowtie index files (genome)..
[2016-12-03 17:27:10] Checking for reference FASTA file
[2016-12-03 17:27:10] Generating SAM header for Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/genome
[2016-12-03 17:27:12] Preparing reads
left reads: min. length=76, max. length=76, 133266131 kept reads (283399 discarded)
right reads: min. length=76, max. length=76, 133088200 kept reads (461330 discarded)
[2016-12-03 18:28:22] Mapping left_kept_reads to genome genome with Bowtie2
[2016-12-04 03:31:38] Mapping left_kept_reads_seg1 to genome genome with Bowtie2 (1/3)
[2016-12-04 03:42:39] Mapping left_kept_reads_seg2 to genome genome with Bowtie2 (2/3)
[2016-12-04 03:53:31] Mapping left_kept_reads_seg3 to genome genome with Bowtie2 (3/3)
[2016-12-04 04:07:24] Mapping right_kept_reads to genome genome with Bowtie2
[2016-12-04 12:42:21] Mapping right_kept_reads_seg1 to genome genome with Bowtie2 (1/3)
[2016-12-04 13:04:21] Mapping right_kept_reads_seg2 to genome genome with Bowtie2 (2/3)
[2016-12-04 13:24:26] Mapping right_kept_reads_seg3 to genome genome with Bowtie2 (3/3)
[2016-12-04 13:48:17] Searching for junctions via segment mapping
[2016-12-04 14:04:52] Retrieving sequences for splices
[2016-12-04 14:06:20] Indexing splices
[2016-12-04 14:06:53] Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/3)
[2016-12-04 14:10:07] Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/3)
[2016-12-04 14:13:25] Mapping left_kept_reads_seg3 to genome segment_juncs with Bowtie2 (3/3)
[2016-12-04 14:16:18] Joining segment hits
[2016-12-04 14:20:55] Mapping right_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/3)
[2016-12-04 14:29:53] Mapping right_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/3)
[2016-12-04 14:36:11] Mapping right_kept_reads_seg3 to genome segment_juncs with Bowtie2 (3/3)
[2016-12-04 14:40:37] Joining segment hits
[2016-12-04 14:47:37] Reporting output tracks
[FAILED]
Error running /usr/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir tophat_res/ENCLB555APY// --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --no-closure-search --no-coverage-search --no-microexon-search --sam-header tophat_res/ENCLB555APY//tmp/genome_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/usr/bin/samtools_0.1.18 --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/genome.fa tophat_res/ENCLB555APY//junctions.bed tophat_res/ENCLB555APY//insertions.bed tophat_res/ENCLB555APY//deletions.bed tophat_res/ENCLB555APY//fusions.out tophat_res/ENCLB555APY//tmp/accepted_hits tophat_res/ENCLB555APY//tmp/left_kept_reads.mapped.bam,tophat_res/ENCLB555APY//tmp/left_kept_reads.candidates tophat_res/ENCLB555APY//tmp/left_kept_reads.bam tophat_res/ENCLB555APY//tmp/right_kept_reads.mapped.bam,tophat_res/ENCLB555APY//tmp/right_kept_reads.candidates tophat_res/ENCLB555APY//tmp/right_kept_reads.bam
Error: failed to retrieve right read for pair # 2037377 !
It looks like some error in input files but I would think it to be highly improbable. So what have I done wrong and is there any way to overcome this error without re-running tophat?
Thanks in advance,
Ivan
Hello Ivan,
I got the same error. Could you please let me know how to fix it or what is the cause?
It is appreciated.
Zuolin Bai
You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.