Hi, I am working on plant Arabidopsis. I have the samples for 4 plant types (WT, KO1, KO2 and OE1) in three replicates, Mock treated and after stress treatment. So in total 24 samples (4 Mock and 4 Treated, 3 replicates of each). I would like to analyse the differential gene expression in each sample type before treatment (Mock) and after treatment. I am using STAR for alignment of the raw reads on reference genome and to output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file. These files I am using for transcript quantification on RSEM. But I am getting a problem:
Cannot open /Documents/Umesh/Analysis123/star/GENOME/genome.grp! It may not exist.
So I seek help for: 1. Is this the right approch? 2. Can I use other programs for DGE analysis 3. What is the proper pipeline for this experiment? Since I am confused about so many tools. Please guide me as I am new to this kind of analysis.
Thanks Umesh
One straightforward approach would be to use STAR to generate a count table that you can use with edgeR. Look up the edgeR userguide - it has a rich description of how to quantify differential gene expression given various experimental designs.
Thank you seidel for you kind suggestion. I will go through the edgeR userguide.
But feeding the transcriptome file to RSEM and using the rounded expected counts from RSEM works fine too. It might work a bit better, because RSEM is smarter about reads which align to multiple features.
To be honest, I've used Salmon (or eXpress) more than RSEM, but I thought maybe I should mention my own experience with RSEM seemed different than what I expected from reading benchmark papers. Namely, I believe the issue with RSEM was that either i) the result seemed a little strange if I perform the Bowtie alignment first (with separate RSEM quantification afterwards) or ii) the alignment from within RSEM (with the default parameters, I believe) seemed to take a prohibitively long time (where I didn't wait to test quantification with multiple samples).
In general, I would recommend expecting a substantial amount of time for each project (including testing of at least some different methods for each project). Most commonly, I would perform a genome alignment, with STAR or TopHat (where I think visually inspecting the alignment can be a useful troubleshooting strategy).
In other words, I think starting with STAR is OK (and closer to what I would do for "initial" analysis), but there isn't really one "best" strategy that you can use without taking the time to critically assess your data.
My limited experience with doing STAR alignment inside of RSEM was that the baked-in STAR parameters were really stringent.
There are many choices for doing DGE analysis on RNA-seq data, now supported by many papers, one of the most popular methods is using DESeq2. If you want a fast alternative and have transcriptome ready then use Salmon + tximport + DESeq2. A more detailed tutorial here http://www.sthda.com/english/wiki/rna-seq-differential-expression-work-flow-using-deseq2