So, there're papers on designing an RNA-seq experiment, and normalizing the data (Bullard et. al and the recent Genetics paper are good reads) but what do folks do for the actual pipeline.
I'm looking at
- filter on quality. (what are your quality/parameter cutoffs?)
- any other pre-processing?
- tophat
- cufflinks
- repeat 1-4 for different set of reads and find differentially expressed genes (cuffdiff)
First, any steps I should add?
Second, there doesn't seem to be much about how to do this. I mean I can read the manuals and execute the commands (steps 3, 4 seem no problem), but I'm looking any pointers to either:
- fully documented pipelines with a explanation of the processing at each step
- shell script(s) of going from reads to differentially expressed genes.
- pubs where this is documented.
I realize each set of data will be different, but it'd be nice to base it on something.
hadn't heard of GNUMAP, checking it out now. i'm not expecting an out-of-the-box solution, just trying to make use of existing knowledge.