Question

How can I predict the time of calculations for RNAseq?

0

Entering edit mode

7.0 years ago

agata88 ▴ 870

Hi all!

I am going to have PE reads for human RNAseq (around 70 millions of reads). How can I predict whether my computer have enough disc space and memory to run mapping reads to reference genome with the use of TopHat or any other RNAseq mapping algorithm?

I would like to decide whether I need to use cloud for this calculation or I can make it on my local computer.

I have 1T disc space and 64GB or RAM, 10 cores.

Thank you in advance,

Agata

RNA-Seq • 1.5k views

ADD COMMENT • link updated 6.8 years ago by Arup Ghosh 3.2k • written 7.0 years ago by agata88 ▴ 870

3

Entering edit mode

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

Please stop using Tophat https://t.co/Es4ohxOEyx Cole and I developed the method in *2008*. It was greatly improved in TopHat2 then HISAT & HISAT2. There is no reason to use it anymore. I have been saying this for years yet it has more citations this year than last #methodsmatter
— Lior Pachter (@lpachter) December 2, 2017

ADD REPLY • link 7.0 years ago by WouterDeCoster 47k

0

Entering edit mode

Woow, I am surprised. Last year I was performing RNAseq analysis withe the use of TopHat and Cufflinks and the results was fine. I was going to repeat that pipeline this year. Thanks for letting me know. I will go with other solutions.

ADD REPLY • link 7.0 years ago by agata88 ▴ 870

1

Entering edit mode

If you are flexible on time then it should work with the specs posted above. How many of these 70M read samples do you expect to do?

I suggest that you use BBMap. It requires about 30G of RAM for human genome. STAR would need about the same. You can find the time a million reads take by adding reads=1000000 parameter to bbmap command line and can then extrapolate from there.

ADD REPLY • link 7.0 years ago by GenoMax 147k

0

Entering edit mode

I have 12 samples. Thanks for tips. Although I am flexible with time I would like to do it wisely.

ADD REPLY • link 7.0 years ago by agata88 ▴ 870

score 1 · Answer 1 · 2018-02-01

1

Entering edit mode

6.8 years ago

Arup Ghosh 3.2k

I guess the processor you are using Intel Xeon processor, so the number of threads will be 10*2=20. If you use Tophat2 with ~8 threads per process it will take >=4 hrs per sample. The complete analysis will take around a day using Tuxedo2. But HISAT2 is a huge improvement, it took ~30 mins for mouse RNA-seq PE data with 30mil reads.

Get BAM file as output from alignment to save a lot of space. As you mentioned you have 12 samples the maximum space required for analysis will be within 200GB.