Question

Speed up Tophat2 running

1

Entering edit mode

7.6 years ago

yuabrahamliu ▴ 60

Hi, all. I'm starting to use tophat2 to map dUTP strand-specific, paired-end RNA-seq data to mm9 genome. Each sample pair includes 2 samples about 500M large. However, I find that the run time is a bit too long. My commandline is like this:

tophat2 -o tophatoutput -p 28 --no-coverage-search --library-type=fr-firststrand --transcriptome-index=topHat/mm9 bowtie2/mm9 sample.R1.trimmed_1.fastq.gz sample.R2.trimmed_2.fastq.gz

I run the command from 7:30 pm and set the threads number to as high as 28, but now it is 10:30 pm, even one sample pair has not been completed. I have 6 sample pairs and it may take too long time in total. Is there something wrong with my command line? Or, can I do something to speed up the running. I will appreciate your suggestions. Below is the report message as the program is running. It is stuck in the step of Reporting output tracks now. Thank you.

[2017-11-17 19:27:08] Checking for Bowtie Bowtie version: 2.2.5.0 [2017-11-17 19:27:08] Checking for Bowtie index files (transcriptome).. Found both Bowtie1 and Bowtie2 indexes. [2017-11-17 19:27:08] Checking for Bowtie index files (genome).. [2017-11-17 19:27:08] Checking for reference FASTA file [2017-11-17 19:27:08] Generating SAM header for /HPCTMP_NOBKUP/wl314/data/other/ReadsMapIndexFiles/bowtie2/mm9 [2017-11-17 19:27:11] Reading known junctions from GTF file [2017-11-17 19:27:14] Preparing reads left reads: min. length=20, max. length=51, 20623862 kept reads (3111 discarded) right reads: min. length=20, max. length=51, 20619508 kept reads (7465 discarded) [2017-11-17 19:36:47] Using pre-built transcriptome data.. [2017-11-17 19:36:48] Mapping left_kept_reads to transcriptome mm9 with Bowtie2 [2017-11-17 19:43:03] Mapping right_kept_reads to transcriptome mm9 with Bowtie2 [2017-11-17 19:49:29] Resuming TopHat pipeline with unmapped reads [2017-11-17 19:49:29] Mapping left_kept_reads.m2g_um to genome mm9 with Bowtie2 [2017-11-17 20:11:09] Mapping left_kept_reads.m2g_um_seg1 to genome mm9 with Bowtie2 (1/2) [2017-11-17 20:14:28] Mapping left_kept_reads.m2g_um_seg2 to genome mm9 with Bowtie2 (2/2) [2017-11-17 20:18:38] Mapping right_kept_reads.m2g_um to genome mm9 with Bowtie2 [2017-11-17 20:40:32] Mapping right_kept_reads.m2g_um_seg1 to genome mm9 with Bowtie2 (1/2) [2017-11-17 20:44:06] Mapping right_kept_reads.m2g_um_seg2 to genome mm9 with Bowtie2 (2/2) [2017-11-17 20:48:40] Searching for junctions via segment mapping [2017-11-17 20:54:01] Retrieving sequences for splices [2017-11-17 20:55:42] Indexing splices Building a SMALL index [2017-11-17 20:56:11] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/2) [2017-11-17 20:58:59] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/2) [2017-11-17 21:02:31] Joining segment hits [2017-11-17 21:05:46] Mapping right_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/2) [2017-11-17 21:08:25] Mapping right_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/2) [2017-11-17 21:12:00] Joining segment hits

[2017-11-17 21:15:20] Reporting output tracks

RNA-Seq tophat2 dUTP • 3.6k views

ADD COMMENT • link 7.6 years ago by yuabrahamliu ▴ 60

3

Entering edit mode

I'm starting to use tophat2

Since you just started with Tophat, this is also the ideal moment to stop with it.

You should know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLY • link 7.6 years ago by WouterDeCoster 48k

1

Entering edit mode

It is not uncommon for these samples to run for a full day for large datasets. That is why you start these while going home and forget about it till next day.

ADD REPLY • link 7.6 years ago by GenoMax 151k

1

Entering edit mode

... or the whole weekend.

ADD REPLY • link 7.6 years ago by michael.ante ★ 4.0k

0

Entering edit mode

Tophat usually needs a lot of time. The "report output tracks" step is often the most time consuming step and you don't see any progress...

If you need something faster, try bbmap it STAR.

ADD REPLY • link 7.6 years ago by michael.ante ★ 4.0k

1

Entering edit mode

Just to add to Michael's comment, just to be aware, the TopHat2/Cufflinks suite of programs (or 'Tuxedo' as they called it) are no longer supported. The replacements are HISAT2/StringTie ('New Tuxedo'). See their publication, here: https://www.nature.com/articles/nprot.2016.095

Coincidentally, I need a new tuxedo ('suit' in British/Irish English).

ADD REPLY • link 7.6 years ago by Kevin Blighe 89k

0

Entering edit mode

It really took about 6 hours to complete the first sample. Thank you.

ADD REPLY • link 7.6 years ago by yuabrahamliu ▴ 60

0

Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

If an answer was helpful you should upvote it. Upvote|Bookmark|Accept

ADD REPLY • link 7.6 years ago by WouterDeCoster 48k