Question

RNAseq computer source for TopHat and Cufflinks

0

Entering edit mode

6.2 years ago

agata88 ▴ 870

Hi all!

I would like to ask about calculation specification for RNAseq analysis with TopHat2 and Cufflinks. I have 24 samples with ~20M of PE reads. The genome size is 6GB. Can anyone who have done similar analysis can give a hint how much computer resource I will need to run mapping with TopHat2 and further analysis of Cuffmerge and Cuffdiff?

Many thanks for any suggestions.

Best, Agata

Cufflinks RNAseq • 2.4k views

ADD COMMENT • link updated 6.2 years ago by Charles Warden 8.3k • written 6.2 years ago by agata88 ▴ 870

0

Entering edit mode

Please do not use TopHat2/Cufflinks anymore

ADD REPLY • link 6.2 years ago by Bastien Hervé 5.9k

0

Entering edit mode

What I should use instead? I used this approach few years ago and It did a good job.

ADD REPLY • link 6.2 years ago by agata88 ▴ 870

0

Entering edit mode

The old tuxedo pipeline TopHat/Cufflinks is now out of date (2012), prefer the new tuxedo HISAT/StringTie/Ballgown (2016) or Kallisto/Sleuth (2016).

You can also take STAR/HISAT2 as aligner, featureCounts/HT-Seq to count, edgeR/DESeq2 for differential expression

Authors and community agree on the fact that TopHat/Cufflinks is not anymore the most accurate pipeline

ADD REPLY • link 6.2 years ago by Bastien Hervé 5.9k

0

Entering edit mode

Ok. Thanks. Do you know which approach will need less computer resources?

ADD REPLY • link 6.2 years ago by agata88 ▴ 870

1

Entering edit mode

I like to use STAR/featureCounts/DESeq2 or even STAR/option in STAR to do the counting/DESeq2, but STAR needs around 30Gb of RAM for a human genome. You can try to use HISAT2 instead which needs 6Gb of RAM for a human genome.

Plus, DESeq2 has a very good vignette with a lot of examples

ADD REPLY • link 6.2 years ago by Bastien Hervé 5.9k

0

Entering edit mode

For your information:

tophat

ADD REPLY • link 6.2 years ago by ATpoint 85k

0

Entering edit mode

Yes I know, Thanks.

ADD REPLY • link 6.2 years ago by agata88 ▴ 870

score 3 · Answer 1 · 2018-10-01

Strictly speaking, there hasn't actually been a formal "answer" provided, so I will respond here (instead of the two comment threads):

I realize that this is what the developer has said to stop using TopHat, but I have definitely seen situations where it was helpful to use a TopHat alignment over a STAR / HISAT alignment. For example, I think some programs may have been developed with TopHat alignments (I know of at least one case I had to use TopHat2 to validate a known splice junction difference), and the more stringent alignment requirements may be useful if you want to avoid aligning contaminants.

In most cases, for something like gene expression, I think you will most likely get similar results (so, if you already have a STAR / HISAT alignment with reasonable results, you probably don't need to also test a TopHat alignment). However, you may also find that if you have something unexpected with a TopHat alignment, then I think there is a decent chance you may see that same trend with the STAR / HISAT alignment (although I would say that is a situation where it may be worth testing the effect of the aligner for your specific project, even if that ultimately doesn't change the result).

However, to answer your original question, I would usually run TopHat2 with 4 treads and 8 GB of RAM. I don't usually run cufflinks/cuffmerge, but it may be worth testing unique-read quantification if something about the transcript estimations doesn't seem right (but I think having replicates can help make the less accurate estimates less significant, at least if you are using unique counts).