Question

RNA-seq analysis for Ion Proton Transcriptome data - tophat2 error

1

Entering edit mode

11.2 years ago

Renee X ▴ 20

Hi there, I'm an undergrad working as a summer student in a lab and I'm having trouble with analyzing the RNA-seq data generated by Ion Proton.

The pipeline suggested by Life Technologies involves a 2-step alignment, the first of which is done with tophat2, and I'm stuck on this step...

the command suggested by Life Technologies:

tophat2 -p 12 --keep-fasta-order --GTF known_genes.gtf \
hg19_bowtie2_index adaptorTrim.fastq

my command:

tophat2 -p 2 --keep-fasta-order -G ../RNAseq_test/h_sapiens_37_asm.gtf ~/bowtie2-2.2.3/scripts/h_sapiens_37_asm adaptorTrim.fastq

the h_sapiens_37_asm.gtf file is actually the ncbi 37.2 build
h_sapiens_37_asm is the name of the bowtie2 index files built from running the make_h_sapiens_ncbi37.sh script
adaptorTrim.fastq is the Ion Proton WT RNA-seq data downloaded from Ion Community and trimmed with cutadapt

and I got the following error

[2014-06-11 09:37:40] Beginning TopHat run (v2.0.11)
-----------------------------------------------
[2014-06-11 09:37:40] Checking for Bowtie
          Bowtie version:     2.2.3.0
[2014-06-11 09:37:40] Checking for Samtools
        Samtools version:     0.1.18.0
[2014-06-11 09:37:40] Checking for Bowtie index files (genome)..
[2014-06-11 09:37:40] Checking for reference FASTA file
    Warning: Could not find FASTA file /home/yzc/bowtie2-2.2.3/scripts/h_sapiens_37_asm.fa
[2014-06-11 09:37:40] Reconstituting reference FASTA file from Bowtie index
  Executing: /home/yzc/bowtie2-2.2.3/bowtie2-inspect /home/yzc/bowtie2-2.2.3/scripts/h_sapiens_37_asm > ./tophat_out/tmp/h_sapiens_37_asm.fa
[2014-06-11 09:40:28] Generating SAM header for /home/yzc/bowtie2-2.2.3/scripts/h_sapiens_37_asm
[2014-06-11 09:40:28] Reading known junctions from GTF file
[2014-06-11 09:40:33] Preparing reads
     left reads: min. length=16, max. length=368, 45510880 kept reads (611 discarded)
Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places
[2014-06-11 10:01:03] Building transcriptome data files ./tophat_out/tmp/h_sapiens_37_asm
[2014-06-11 10:01:07] Building Bowtie index from h_sapiens_37_asm.fa
    [FAILED]
Error: Couldn't build bowtie index with err = 1

I have also tried tophat2 with the same datasets but without applying the gtf file, which resulted in the following error:

Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places
[2014-06-10 15:13:37] Mapping left_kept_reads to genome h_sapiens_37_asm with Bowtie2
[sam_read1] missing header? Abort!
    [FAILED]
Error running bowtie:
Error while flushing and closing output
terminate called after throwing an instance of 'int'
(ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)

When I tried to view left_kept_reads.bam with samtools, it says that the file doesn't have an EOF marker

Any help would be much appreciated!

RNA-Seq Torrent Ion Proton • 7.1k views

ADD COMMENT • link updated 3.8 years ago by Ram 45k • written 11.2 years ago by Renee X ▴ 20

0

Entering edit mode

Usually the "terminate called after throwing an instance of 'int'" error message indicates that one of your input fastq files is messed up. Try just taking a subset and see if that fixes that error.

To see what the indexing step of the transcriptome failed, look in the run log to find the exact command run and then run that yourself (hopefully the ./tophat_out/tmp/h_sapiens_37_asm.fa files still exists).

ADD REPLY • link 11.2 years ago by Devon Ryan 105k

0

Entering edit mode

what does:

grep -m5 ">" tophat_out/tmp/h_sapiens_37_asm.fa

and

head ../RNAseq_test/h_sapiens_37_asm.gtf

show?

It's taking only 4 seconds to build your transcriptome data files so it seems like something is off there.

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 11.2 years ago by brentp 24k

Ram · Answer 1 · 2014-06-11

1

Entering edit mode

11.2 years ago

Charles Warden 8.3k

I think you are following the best practice, so that is good.

Most of errors seem to relate to the reference:

Is your reference specifically indexed for Bowtie2? An unindexed reference or a reference indexed for Bowtie1 won't work.

http://bowtie-bio.sourceforge.net/tutorial.shtml#newi

If I recall correctly, I think indexing your own reference worked better than using a pre-defined index for Bowtie2. Based upon the fact that there is an error saying that there is no .fasta file, my guess is that you may be using a pre-defined index.

ADD COMMENT • link updated 5.9 years ago by Ram 45k • written 11.2 years ago by Charles Warden 8.3k

0

Entering edit mode

Yes, the reference is specifically indexed for Bowtie2. There's a directory called 'scripts' included in the latest version of Bowtie2, which contains .sh files for building Bowtie2 indexes. I will try indexing the reference manually. Thank you:)

ADD REPLY • link 11.2 years ago by Renee X ▴ 20

Ram · Answer 2 · 2014-06-18

1

Entering edit mode

11.2 years ago

Renee X ▴ 20

Turns out the error was caused by some hardware limitations (i.e. RAM)... Thank you guys so much for the help!

Renee

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 11.2 years ago by Renee X ▴ 20