Question

Differential gene expression using edgeR and using a generalised linear model (GLM)

0

Entering edit mode

5.5 years ago

ali.hakimzadeh ▴ 10

Hi, I am new to "edgeR," and I don't have a basis so much in R programming since I used to work with python for my analysis. The counts that achieved from ht-seq for different samples stored in separate files and different folders; files names are all "counts.txt" and folders named as sample names. The goal is to compare 30 samples with 20 other samples that represent pro-vaccine and pre-vaccine samples. Counts.txt files contain two columns one for gene names and one for counts. As I learned from the documentary, I have to use "readDGE" for reading the "counts.txt" files in each folder. I want to do differential gene expression analysis by using a GLM. Although I have to identify DE genes by using log2 fold change and likelihood (LR) test in edgeR. I don't know where to start and collate the data together to make DGElist and other steps! I start with some codes like below :

library(edgeR)
directory1="/home/ali/Desktop/SAMPLES1/"
directory2="/home/ali/Desktop/SAMPLES2/"
files1 <-grep("counts.txt",list.files(directory1),value = TRUE)
files2 <-grep("counts.txt",list.files(directory2),value = TRUE)
x <-readDGE(files1,columns = c(1,2),header=FALSE)
d<-readDGE(files2,columns = c(1,2),header=FALSE)

And the second fact is that I don't need to normalize my data. So is there anyone can guide me to which steps I have to do?

Thanks in advance

RNA-Seq edgeR R • 1.6k views

ADD COMMENT • link 5.5 years ago by ali.hakimzadeh ▴ 10

1

Entering edit mode

See the manual section on Reading counts from a file on how to get the htseq output into R and how to run an edgeR analysis. I strongly recommend to stick to the default analysis path without putting in custom analysis strategies. Putting together a count matrix for all samples with columns = samples and rows = genes before reading it into R might be desirable as this is the default input format. This can be done with Unix tools such as cut.

And the second fact is that I don't need to normalize my data

What makes you say that. Different samples have different total read counts and library compositions. Normalization is always necessary.

ADD REPLY • link 5.5 years ago by ATpoint 85k

0

Entering edit mode

i already read the manual but maybe i have to read it again to understand well! it is a little bit complicated and unclear for me the way that they describe there, and about the normalization that is what my supervisor told me to do so as i know it is a necessary step that we have to do!

ADD REPLY • link 5.5 years ago by ali.hakimzadeh ▴ 10