Question

Alternatives to Tophat and Trinity for calculating differentially expressed gene

1

Entering edit mode

9.3 years ago

Being Bioinformatician ▴ 250

Respected Members,

Is there any tool other than tophat and Trinity which I can use for Calculating Differential Expressed Gene. I am looking for stand alone tool which require less RAM memory.

Thanks in advance

RNA-Seq • 4.8k views

ADD COMMENT • link updated 2.0 years ago by Ram 44k • written 9.3 years ago by Being Bioinformatician ▴ 250

1

Entering edit mode

I don't want to sound pedantic or incorrect here, but I think there is a bit of confusion... Tophat is a splice-aware aligner, trinity is a de-novo assembler and neither of the two is for differential expression. The answers below seem to be a mix of these tasks depending on how the question was interpreted. Maybe you should clarify what you need?

ADD REPLY • link 9.3 years ago by dariober 15k

0

Entering edit mode

Tophat is used for differential expression analysis if I am not wrong.

http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html

Please guide me if I am wrong

ADD REPLY • link updated 2.0 years ago by Ram 44k • written 9.3 years ago by Being Bioinformatician ▴ 250

0

Entering edit mode

The output of tophat is a bam file of reads aligned to the reference genome, in itself it doesn't tell you anything about differences with other samples. Typically you use the bam files in downstream analyses to obtain differential expression. If you use edgeR or DESeq for DE a typical pipeline might be: For each sample align reads to reference (e.g. with tophat) -> count reads in genes (e.g. with htseq-count) -> produce a count matrix (rows: genes, columns: samples) -> detect DE between groups via glm (e.g. via edgeR). (Each step can have several variations)

ADD REPLY • link 9.3 years ago by dariober 15k

score 2 · Answer 1 · 2015-07-21

2

Entering edit mode

9.3 years ago

DG 7.3k

There is also Sailfish: http://www.cs.cmu.edu/~ckingsf/software/sailfish/

Not sure on its RAM requirements.

ADD COMMENT • link 9.3 years ago by DG 7.3k

Ram · Answer 2 · 2015-07-21

It seems that the latest improvement in mapping RNA-Seq is Kallisto

kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. On benchmarks with standard RNA-Seq data, kallisto can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build. Pseudoalignment of reads preserves the key information needed for quantification, and kallisto is therefore not only fast, but also as accurate as existing quantification tools. In fact, because the pseudoalignment procedure is robust to errors in the reads, in many benchmarks kallisto significantly outperforms existing tools

Ram · Answer 3 · 2015-07-21

0

Entering edit mode

9.3 years ago

EVR ▴ 610

Hi,

To find the differential expressed genes there a lot of methods like DESeq2, DESeq2, EdgeR, limma+voom etc. In terms of tool I would recommend Chipster which has all options. Give it a try.

ADD COMMENT • link updated 2.0 years ago by Ram 44k • written 9.3 years ago by EVR ▴ 610

Ram · Answer 4 · 2015-07-21

We use the RSEM algorithm: http://deweylab.biostat.wisc.edu/rsem/. This is the same tool that was used for the large TCGA dataset. We've implemented it in our products as well (Array Studio, OncoLand, ImmunoLand), and used it to recount (after remapping with OSA) all the TCGA data.

We then use DESeq2 as the user above recommended (also have a reimplementation of that in our systems as well).

Ram · Answer 5 · 2015-07-21

BBMap and Seal can directly output FPKMs and coverage when mapping to a transcriptome. BBMap uses more memory than Tophat, but Seal's memory usage is adjustable (tradeoff with sensitivity) using the rskip flag. It's extremely fast.

Usage:

seal.sh ref=transcriptome.fa in=reads.fq fpkm=fpkm.txt rskip=30 prealloc

That command should use approximately 0.5 bytes per reference base.

Ram · Answer 6 · 2015-07-22

Hi,

If you determine the differentially expressed transcripts after de novo transcriptome assembly, I recommend to you using "Corset: enabling differential gene expression analysis for de novo assembled transcriptomes". By obtaining the RNA-Seq count data, you can try any of package; DeSeq2, edgeR, limma, boom, etc.

Corset: https://github.com/Oshlack/Corset/wiki

Ram · Answer 7 · 2015-07-22

Both software were not used for DEG analysis. Tophat was an aligner software and always used to align reads to reference genome. Trinity was a software platform which used in RNA-seq analysis without reference genome by combining any useful software such as bowtie, RSEM, etc. In RNA-seq analysis with reference genome, cufflinks always used to calculate FPKM values of specific genes or isoforms and some R packages such as bioconductor were used in DEG analysis based on known FPKM values; In RNA-seq analysis without reference genome, RSEM and exPress were used to calculate FPKM values and some R packages were used in DEG analysis. I hope these things can help you.