I'm taking a look at EdgeR. I know R a little bit but I'm much better at Python using Scipy, Numpy, and Pandas. What are the steps the EdgeR takes in computing the differential expression for RNA-seq data? I have my data in (rows: samples (5 classes of samples) and columns: genes w/ every [i,j] of that matrix as the gene expression.
I saw that EdgeR uses Fisher's exact test to provid ep-values associated with changes in expression between samples. Does EdgeR log transform the gene expression values, merge the samples of the same class (mean?) and then compute pairwise distances between the vectors? Or does it do the log transformation at the last step? A walk through on how this works would be really interesting. I try not to black box things so I want to know how to code the algorithm before I start using the tools for things like this.
Did you read the manual? https://bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
It contains a wealth of information...
It's not necessarily the case that we can't or don't want to help you, but these questions would be more appropriate on https://support.bioconductor.org/
Thanks, I'll check it out again. The doc is pretty thorough for all of the EdgeR features. This question was mostly for a quick grasp on the concepts. btw, I didn't see that there was a specific domain for biocondutor questions until now. Awesome.
Hi, jol, If you have the matrix of raw read counts you can use the Trinity steps for DEG using edgeR https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Differential-Expression
"A full example of the edgeR pipeline involving combining reads from multiple samples, assembling them using Trinity, separately aligning reads back to the trintiy assemblies, abundance estimation using RSEM, and differential expression analysis using edgeR is provided at: $TRINITY_HOME/sample_data/test_full_edgeR_pipeline"