Hi all,
I've been looking around and there doesn't seem to be much information on the development of RNA-Seq pipelines for differential expression analysis. I am about to start work on setting up a basic skeleton pipeline.
As a very high level overview, how do the following steps look? Can anyone comment on/add/remove steps? I have also added some questions regarding the steps to aid in my own understanding.
1) Align reads to genome using Tophat/Bowtie
(perhaps use the new Tophat-fusion to find fusion transcripts? I also guess it is important to ensure that the genome we use will match our preferred annotation source downstream e.g. if we prefer Ensembl, we should use NCBIv37 rather than hg19 to ensure consistency in chromosome names?)
2) Mark/remove duplicate reads.
3) Use Cufflinks to assemble transcripts.
4) Run Cuffdiff to assess differential expression.
5) Annotate transcripts
(unsure on how exactly this is done - can anyone comment? For example, what program might be used and what happens when we attempt to annotate novel-spliced or fully novel transcripts? Will these be recognised somehow?)
6) At this point I guess we have an annotated matrix that could be used in next gen or classical visualisation programs? Any suggestions on how to view?
@Travis: BWA does gapped alignment, but the gaps are on the order of 1-10 bp; BWA does not handle gaps the size of introns. You need to use a splice-aware aligner when aligning to the genome. See my answer and Mikael's above for some aligner suggestions.
Why do Bowtie or BWA only make sense if mapping to the transcriptome?
The genome has gaps between the exons and bowtie and bwa cannot map a read that crosses those gaps.
But BWA does do gapped alignment, doesn't it?
Thanks for that. Off the cuff, it makes me wonder why anyone would use bowtie for RNA-Seq!
The more I think about this, the more I have to ask - do aligners like Bowtie/Eland discard intron-spanning reads?
yes, but Solexa reads used to be shorter so the junction spanners were not common when ~30bp. software is always fighting the last war.