This is how I would explain RNA-seq to someone who is new to the area.
Step 0: You have a hypothesis. You have decided that RNA-seq will be an ideal/novel experiment to investigate your hypothesis.
Step 1: Get your samples (case/control, tumor/normal, time-series... extract your RNA and make sure you do all QC)
- Library preparation: key experimental step of RNA-seq. This determines the outcome of your experiment.
Step 2: Deep sequencing (Read on next-generation sequencing. You may use one of the recent NGS platform for your sequencing. Read about them here). Make sure that you understand the lingua franca of NGS (for example: single-end vs. paired-end, coverage etc.)
Step 3: Analysis pipe-line
Typical output from an RNA-seq experiment is a .fastq file with sequence reads (two files for paired end experiment). Depending on the biological question, down-stream analysis can be designed.
I am adding a highly simplified conceptual framework to understand RNA-seq analytical frameworks
Primary analysis:
- QC: Quality control and removal of poor-quality reads, adapters and linkers
Secondary analysis
Mapping: Find the location where each short read best matches the reference sequence. It is ideal to progressively increase the
complexity of the mapping strategy to handle the unaligned reads from
your experiment. This will help to turn millions of short reads into a
quantification of expression.
Summarization: Aggregate sequence reads over biological units (exons, transcripts, genes). This is where you bring biological
context to your sequencing reads.
Normalization: This is the step that help you to compare expression levels between (for example cases vs. controls) and within
your samples (biological vs. technical replicates). Several
statistical approaches are available see: RPKM(single-read),
FPKM(paired-end) Quantile normalization, House-keeping gene
normalization etc.
Differential expression testing: This step help to identify genes that have changed significantly. Here you use table of
summarized count data and perform statistical test between samples
(pairwise or multiple group comparisons) of interest. You can use
statistical techniques based on empirical bayes estimation, negative
binomial distribution etc for this.
Tertiary analysis
- Down-stream analysis: Creating lists of DE genes gives you an estimate of expression trends. You can now use the list(s) and perform
meta-analysis to see the functional, pathway-centric or network
analysis. Remember that most of the existing down-stream analysis
tools are designed for gene expression data from microarray
experiments. You have to use tools that are designed for RNA-seq data
for down-stream analysis (for example: fusion transcripts detection tools, enrichment tools designed to use RNASeq output etc. ). Other option is that you can use only gene
lists for such analysis.
Step 4: Interpretation of your results: Use the results to assess your hypothesis
Step 5: Validation using alternate techniques (resequencing of your gene of interest, quantifying transcript levels, functional studies etc. )
PS. This answer is based on references in my citeulike library (See rnaseq)
nice find, haven't see this one, I will post it in the tutorial section as well