Hi, I am new to "edgeR," and I don't have a basis so much in R programming since I used to work with python for my analysis. The counts that achieved from ht-seq for different samples stored in separate files and different folders; files names are all "counts.txt" and folders named as sample names. The goal is to compare 30 samples with 20 other samples that represent pro-vaccine and pre-vaccine samples. Counts.txt files contain two columns one for gene names and one for counts. As I learned from the documentary, I have to use "readDGE" for reading the "counts.txt" files in each folder. I want to do differential gene expression analysis by using a GLM. Although I have to identify DE genes by using log2 fold change and likelihood (LR) test in edgeR. I don't know where to start and collate the data together to make DGElist and other steps! I start with some codes like below :
library(edgeR)
directory1="/home/ali/Desktop/SAMPLES1/"
directory2="/home/ali/Desktop/SAMPLES2/"
files1 <-grep("counts.txt",list.files(directory1),value = TRUE)
files2 <-grep("counts.txt",list.files(directory2),value = TRUE)
x <-readDGE(files1,columns = c(1,2),header=FALSE)
d<-readDGE(files2,columns = c(1,2),header=FALSE)
And the second fact is that I don't need to normalize my data. So is there anyone can guide me to which steps I have to do?
Thanks in advance
See the manual section on
Reading counts from a file
on how to get thehtseq
output into R and how to run an edgeR analysis. I strongly recommend to stick to the default analysis path without putting in custom analysis strategies. Putting together a count matrix for all samples with columns = samples and rows = genes before reading it into R might be desirable as this is the default input format. This can be done with Unix tools such ascut
.What makes you say that. Different samples have different total read counts and library compositions. Normalization is always necessary.
i already read the manual but maybe i have to read it again to understand well! it is a little bit complicated and unclear for me the way that they describe there, and about the normalization that is what my supervisor told me to do so as i know it is a necessary step that we have to do!