Best Workflow For Differential Expression Analysis Using De Novo Transcriptomes And Illumina Reads In 2014
3
0
Entering edit mode
10.6 years ago
Birdman ▴ 20

What is, in your opinion/experience, the best workflow for gene differential expression analysis of Illumina reads and de novo transcriptomes (e.g. generated by Trinity), without a reference genome?

Please suggest tools that are compatible with each other:

(pre-processing of reads) - Alignment tool - Read summarization - DE analysis

differential-expression workflow alignment • 5.3k views
ADD COMMENT
1
Entering edit mode
10.6 years ago
seidel 11k

A fairly straightforward approach is to simply use your de novo transcriptome to create an alignment index (e.g. a bowtie index), then use bowtie to align your reads to the index. The SAM or BAM output can then be parsed for summarization, i.e. you can count the number of reads mapping to each transcript for each sample to generate a count table. This could be done in perl or python or R (I don't know of an off the shelf solution), or whatever your favorite language is (the SAM format is easily parsed). Normally we think of the alignment results having chromosomal coordinates, but if the alignment index consists of your transcripts, then each read maps to a transrcipt name rather than a chromosome - so you just have to count these names. Once you have a count table, edgeR or DESeq are great options for quantifying differentially expressed transcripts, as mentioned by User000. This is fairly generic, and if you're simply looking to discover differentially expressed transcripts under a given set of conditions, this approach is fine (it won't be your limiting step). But depending on what you're trying to achieve some things could be tricky: such as whether you should allow any multimapping, in case your transcripts have a lot of redundant sequence. And you won't be taking advantage of reads that cross splice junctions, but you may not need to to simply find genes changing under condition X.

ADD COMMENT
0
Entering edit mode

I would recommend eXpress for the transcript summarization. For example, this should provide more robust results than something like idxstats:

http://cdwscience.blogspot.com/2014/02/mrna-quantification-via-express.html

ADD REPLY
0
Entering edit mode
10.6 years ago
User000 ▴ 710

use DESeq (http://genomebiology.com/2010/11/10/R106 - an original paper) an R package to test for differential expression. Although, there are also BaySeq and EdgeR

ADD COMMENT
0
Entering edit mode

I asked for a workflow. Which alignment tool and read summarization tool do you use before DESeq?

ADD REPLY
0
Entering edit mode

Not sure about alignment tool, I guess you can use BWA. Then, extract count data from i.e. .sam files (You will need to produce a script to do that),eventually use i.e. DESeq to identify differentially expressed genes. Also, you may want to use ErmineJ to see the functional enrichment. I agree with cwarden45, it is quite tricky, I think, it is important to choose also the right assembly method.

ADD REPLY
0
Entering edit mode
10.6 years ago

I think differential expression using the assembled contigs is a bit tricky. For example, see this related response:

A: Trinity/RSEM/edgeR pipeline...now what?

ADD COMMENT

Login before adding your answer.

Traffic: 878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6