Hi all!
I am going to have PE reads for human RNAseq (around 70 millions of reads). How can I predict whether my computer have enough disc space and memory to run mapping reads to reference genome with the use of TopHat or any other RNAseq mapping algorithm?
I would like to decide whether I need to use cloud for this calculation or I can make it on my local computer.
I have 1T disc space and 64GB or RAM, 10 cores.
Thank you in advance,
Agata
You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.
Woow, I am surprised. Last year I was performing RNAseq analysis withe the use of TopHat and Cufflinks and the results was fine. I was going to repeat that pipeline this year. Thanks for letting me know. I will go with other solutions.
If you are flexible on time then it should work with the specs posted above. How many of these 70M read samples do you expect to do?
I suggest that you use BBMap. It requires about 30G of RAM for human genome. STAR would need about the same. You can find the time a million reads take by adding
reads=1000000
parameter to bbmap command line and can then extrapolate from there.I have 12 samples. Thanks for tips. Although I am flexible with time I would like to do it wisely.