Rna-Seq Pipeline
9
45
Entering edit mode
14.6 years ago
brentp 24k

So, there're papers on designing an RNA-seq experiment, and normalizing the data (Bullard et. al and the recent Genetics paper are good reads) but what do folks do for the actual pipeline.

I'm looking at

  1. filter on quality. (what are your quality/parameter cutoffs?)
  2. any other pre-processing?
  3. tophat
  4. cufflinks
  5. repeat 1-4 for different set of reads and find differentially expressed genes (cuffdiff)

First, any steps I should add?

Second, there doesn't seem to be much about how to do this. I mean I can read the manuals and execute the commands (steps 3, 4 seem no problem), but I'm looking any pointers to either:

  1. fully documented pipelines with a explanation of the processing at each step
  2. shell script(s) of going from reads to differentially expressed genes.
  3. pubs where this is documented.

I realize each set of data will be different, but it'd be nice to base it on something.

pipeline next-gen-sequencing rna rna-seq • 34k views
ADD COMMENT
11
Entering edit mode
14.6 years ago
Dstan ▴ 160

We're getting ready to publish a study in which we use RNA-seq, and we used a piece of software called GNUMAP. We did not apply any filtering on the read qualities, as we found that lower-quality reads simply didn't map as well. As far as the post-mapping analysis, we're still waiting to hear back from our statistics colleagues on the model they've developed.

As far as an out-of-the-box solution for RNA-seq, I'm not sure how much you'll be able to find.

ADD COMMENT
2
Entering edit mode

hadn't heard of GNUMAP, checking it out now. i'm not expecting an out-of-the-box solution, just trying to make use of existing knowledge.

ADD REPLY
10
Entering edit mode
14.6 years ago
Wjeck ▴ 490

No idea about where these steps exist as a well documented whole, but I can pass on our experience. We're doing a pretty massive amount of RNA-seq at our institution as part of The Cancer Genome Atlas, and our methods are along the lines you describe.

Bowtie/Tophat for mapping has been our best bet for spliced sequence alignment. I know the group working on this tried other techniques with mapping onto a reference "transcriptome" that has some advantages in terms of mapping but can be harder to deconvolute in cases where transcripts overlap.

ADD COMMENT
0
Entering edit mode

thanks, at least it's good to know you decided on a similar overall pipeline after looking around.

ADD REPLY
6
Entering edit mode
14.5 years ago
Michael 55k

I think, one important step that is missing here could be

  1. remove/condense (100%?) identical reads into one read

in the filtering step. A large amount of reads could be e.g. artifacts from a PCR step in the wet-lab pipeline. This can be done e.g. with the tool FASTA collapser from the FASTX tools. For a quantitative approach I would prefer this, but I guess it's controversial. Any experiences with that?

Another filtering step can be to clip the reads removing low-quality regions instead of removing only total reads.

ADD COMMENT
2
Entering edit mode

My understanding is that removing identical reads is a step that is typical for DNA analysis, but more controversial when it comes to RNA-Seq because the rationale for it is less clear here (are we only removing PCR artifacts, or also introducing a quantitative bias?).

ADD REPLY
0
Entering edit mode

Note, I wrote this almost 3 years ago. Now, I wouldn't do it anymore for a differential analysis, with the argument that on average PCR-artifacts should equally affect both conditions. That's possibly still controversial.

ADD REPLY
0
Entering edit mode

I'll admit that I didn't see the date of the original answer :)

ADD REPLY
6
Entering edit mode
9.3 years ago

We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at http://www.rnaseq.wiki/.

This material was released alongside this publication:

Malachi Griffith, Jason R. Walker, Nicholas C. Spies, Benjamin J. Ainscough, Obi L. Griffith. 2015. Informatics for RNA-seq: A web resource for analysis on the cloud.11(8):e1004393.

The Supplementary Information for this publication includes an extensive review of RNA-seq wet lab and analysis concepts, existing tools, common questions, etc.

All materials associated with this publication, including high resolution and original figure files, supplementary tables, etc. are available here

This publication was inspired by workshops that we have taught at CBW, CSHL, and NYGC over the last few years. These workshops are ongoing and we hope to maintain and expand the content in the coming years.

ADD COMMENT
2
Entering edit mode
11.5 years ago
wadunn83 ▴ 90

For anyone still interested in this type of thing:

If using Tophat Cufflinks, the authors generally do not recommend removing poor quality reads since their process will simply down value the alignments of poor quality reads and sometimes they can actually help things.

As for 3-5:

I have recently written a pipeline called Blacktie to do just this, plus do some automated analysis with cummeRbund.

Installation via pip:

[sudo] pip install -U blacktie
ADD COMMENT
0
Entering edit mode

Could you give a source for the top statement about pre-filtering reads for tophat? I've been trying to learn about this topic and haven't found a whole lot honestly.

ADD REPLY
1
Entering edit mode
11.6 years ago
Biojl ★ 1.7k

You may want to take a look to The Simple Fool’s Guide to Population Genomics via RNA-Seq done at the PALUMBI lab. It's a functional fully documented pipeline from 0.

http://sfg.stanford.edu/guide.html

Edit PD: OK, yes, I didn't saw this post was from 3 years ago.

ADD COMMENT
0
Entering edit mode
11.6 years ago
xiangwulu ▴ 120
  1. fastqc could be used for the quality control
  2. adptor may need to be removed before the alignment, in case the long adaptor affects the aligning result
  3. & 4 other aligner may worth to look at depends on the length of the reads. (BWA, Bowtie, Bfast)

list of alignment software:

http://en.wikipedia.org/wiki/List_of_sequence_alignment_software

http://elements.eaglegenomics.com/

list of adaptor removal software:

http://bioscholar.com/genomics/tools-remove-adapter-sequences-next-generation-sequencing-data/

ADD COMMENT
0
Entering edit mode
9.8 years ago
Czh3 ▴ 190

This pipeline can help your do quality control, cut adapter, mapping, transcript assemble, different expression gene detecting...

Try this: https://github.com/Czh3/NGSTools

ADD COMMENT
0
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2496 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6