RNAseq computer source for TopHat and Cufflinks
1
0
Entering edit mode
6.2 years ago
agata88 ▴ 870

Hi all!

I would like to ask about calculation specification for RNAseq analysis with TopHat2 and Cufflinks. I have 24 samples with ~20M of PE reads. The genome size is 6GB. Can anyone who have done similar analysis can give a hint how much computer resource I will need to run mapping with TopHat2 and further analysis of Cuffmerge and Cuffdiff?

Many thanks for any suggestions.

Best, Agata

Cufflinks RNAseq • 2.4k views
ADD COMMENT
0
Entering edit mode

Please do not use TopHat2/Cufflinks anymore

ADD REPLY
0
Entering edit mode

What I should use instead? I used this approach few years ago and It did a good job.

ADD REPLY
0
Entering edit mode

The old tuxedo pipeline TopHat/Cufflinks is now out of date (2012), prefer the new tuxedo HISAT/StringTie/Ballgown (2016) or Kallisto/Sleuth (2016).

You can also take STAR/HISAT2 as aligner, featureCounts/HT-Seq to count, edgeR/DESeq2 for differential expression

Authors and community agree on the fact that TopHat/Cufflinks is not anymore the most accurate pipeline

ADD REPLY
0
Entering edit mode

Ok. Thanks. Do you know which approach will need less computer resources?

ADD REPLY
1
Entering edit mode

I like to use STAR/featureCounts/DESeq2 or even STAR/option in STAR to do the counting/DESeq2, but STAR needs around 30Gb of RAM for a human genome. You can try to use HISAT2 instead which needs 6Gb of RAM for a human genome.

Plus, DESeq2 has a very good vignette with a lot of examples

ADD REPLY
0
Entering edit mode

For your information:

tophat

ADD REPLY
0
Entering edit mode

Yes I know, Thanks.

ADD REPLY
3
Entering edit mode
6.2 years ago

Strictly speaking, there hasn't actually been a formal "answer" provided, so I will respond here (instead of the two comment threads):

I realize that this is what the developer has said to stop using TopHat, but I have definitely seen situations where it was helpful to use a TopHat alignment over a STAR / HISAT alignment. For example, I think some programs may have been developed with TopHat alignments (I know of at least one case I had to use TopHat2 to validate a known splice junction difference), and the more stringent alignment requirements may be useful if you want to avoid aligning contaminants.

In most cases, for something like gene expression, I think you will most likely get similar results (so, if you already have a STAR / HISAT alignment with reasonable results, you probably don't need to also test a TopHat alignment). However, you may also find that if you have something unexpected with a TopHat alignment, then I think there is a decent chance you may see that same trend with the STAR / HISAT alignment (although I would say that is a situation where it may be worth testing the effect of the aligner for your specific project, even if that ultimately doesn't change the result).

However, to answer your original question, I would usually run TopHat2 with 4 treads and 8 GB of RAM. I don't usually run cufflinks/cuffmerge, but it may be worth testing unique-read quantification if something about the transcript estimations doesn't seem right (but I think having replicates can help make the less accurate estimates less significant, at least if you are using unique counts).

ADD COMMENT

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6