Hi there, I'm an undergrad working as a summer student in a lab and I'm having trouble with analyzing the RNA-seq data generated by Ion Proton.
The pipeline suggested by Life Technologies involves a 2-step alignment, the first of which is done with tophat2, and I'm stuck on this step...
the command suggested by Life Technologies:
tophat2 -p 12 --keep-fasta-order --GTF known_genes.gtf \
hg19_bowtie2_index adaptorTrim.fastq
my command:
tophat2 -p 2 --keep-fasta-order -G ../RNAseq_test/h_sapiens_37_asm.gtf ~/bowtie2-2.2.3/scripts/h_sapiens_37_asm adaptorTrim.fastq
the
h_sapiens_37_asm.gtf
file is actually the ncbi 37.2 buildh_sapiens_37_asm
is the name of the bowtie2 index files built from running themake_h_sapiens_ncbi37.sh
scriptadaptorTrim.fastq is the Ion Proton WT RNA-seq data downloaded from Ion Community and trimmed with cutadapt
and I got the following error
[2014-06-11 09:37:40] Beginning TopHat run (v2.0.11)
-----------------------------------------------
[2014-06-11 09:37:40] Checking for Bowtie
Bowtie version: 2.2.3.0
[2014-06-11 09:37:40] Checking for Samtools
Samtools version: 0.1.18.0
[2014-06-11 09:37:40] Checking for Bowtie index files (genome)..
[2014-06-11 09:37:40] Checking for reference FASTA file
Warning: Could not find FASTA file /home/yzc/bowtie2-2.2.3/scripts/h_sapiens_37_asm.fa
[2014-06-11 09:37:40] Reconstituting reference FASTA file from Bowtie index
Executing: /home/yzc/bowtie2-2.2.3/bowtie2-inspect /home/yzc/bowtie2-2.2.3/scripts/h_sapiens_37_asm > ./tophat_out/tmp/h_sapiens_37_asm.fa
[2014-06-11 09:40:28] Generating SAM header for /home/yzc/bowtie2-2.2.3/scripts/h_sapiens_37_asm
[2014-06-11 09:40:28] Reading known junctions from GTF file
[2014-06-11 09:40:33] Preparing reads
left reads: min. length=16, max. length=368, 45510880 kept reads (611 discarded)
Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places
[2014-06-11 10:01:03] Building transcriptome data files ./tophat_out/tmp/h_sapiens_37_asm
[2014-06-11 10:01:07] Building Bowtie index from h_sapiens_37_asm.fa
[FAILED]
Error: Couldn't build bowtie index with err = 1
I have also tried tophat2 with the same datasets but without applying the gtf file, which resulted in the following error:
Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places
[2014-06-10 15:13:37] Mapping left_kept_reads to genome h_sapiens_37_asm with Bowtie2
[sam_read1] missing header? Abort!
[FAILED]
Error running bowtie:
Error while flushing and closing output
terminate called after throwing an instance of 'int'
(ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)
When I tried to view left_kept_reads.bam with samtools, it says that the file doesn't have an EOF marker
Any help would be much appreciated!
Usually the "terminate called after throwing an instance of 'int'" error message indicates that one of your input fastq files is messed up. Try just taking a subset and see if that fixes that error.
To see what the indexing step of the transcriptome failed, look in the run log to find the exact command run and then run that yourself (hopefully the ./tophat_out/tmp/h_sapiens_37_asm.fa files still exists).
what does:
and
show?
It's taking only 4 seconds to build your transcriptome data files so it seems like something is off there.