My RNA-seq read count table consists of samples fro two tissues (root and shoot), different time-points (1hr, 3hr, 12hr, 1day etc). and stress and no-stress conditions all in one table. For example:
root_ctrl_rep1 root_ctrl_rep2 root_1hr_rep1 root_1hr_rep2 root_3hr_rep1 root_3hr_rep2 shoot_ctrl_rep1 shoot_ctrl_rep2 shoot_1hr_rep1 shoot_1hr_rep2 shoot_3hr_rep1 shoot_3hr_rep2
As of now, I am interested only in a basic DEG analysis between the following pairs
root_ctrl vs root_1hr
root_ctrl vs root_3hr
shoot_ ctrl vs shoot_1hr
shoot_ctrl vs shoot_3hr
----and so on
How do go about this ?.
library("DESeq2")
count_table<-read.table("counts.txt", header=T, row.names=1, sep="\t")
exp_design<-data.frame(row.names=colnames(count_table), condition = c("root_ctrl","root_1hr","root_6hr","root_12hr","shoot_ctrl","shoot_1hr","shoot_6hr","shoot_12hr","shoot_day1","root_ctrl","root_3hr","root_6hr","root_12hr","root_day1","shoot_ctrl","shoot_1hr","shoot_3hr","shoot_6hr","shoot_12hr","shoot_12hr","shoot_day1","root_1hr","root_3hr","root_day1","shoot_3hr","shoot_3hr"))
conditions=exp_design$condition
ds <- DESeqDataSetFromMatrix(countData=count_table, exp_design, formula(~condition))
I am not sure if the above is right and what to do next. Any suggestion is greatly appreciated. Thanks
Thanks for the pointers. I will try that out
Hi Devon, I was going through the DESeq (and DESeq2) papers to understand the models they are using, and also because I am a novice in this field. In the paper, they are calculating the size-factor which is explained here : http://seqanswers.com/forums/showpost.php?p=16468&postcount=13
So in my case, if I want to compare root_ctrl (2 replicates) with root_12hr(2 replicates), the library size should be calculated over 4 samples or all the samples in the count matrix (26 in this case) ? Thanks
DESEq2 does size-factor normalization automatically. So you don't have to calculate the size factors unless you need it for some specific reason. I would strongly suggest reading the DESeq2 pipeline, which will help to clarify many of your doubts
https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
Yes, I started with the DESeq2 paper, but the kept referring to the previous paper so had to read that one to get an idea.
You can calculate the size factors over all of the samples. The only time that that's problematic is if there's a large (>10x) difference in coverage between some of the groups (in my experience that tends to throw things off). So don't worry about computing size factors only once unless you have funky data like that.
Thanks a lot for the input !
Hi Devon, one last query. I am also interested in constructing co-expression networks with the rlog transformed data.
rlogged_data <- rlog(ds2)
How do I export rlogged_data object as a matrix (text or csv) for further analysis in WGCNA ? Thanks
Presumably writing
assay(rlogged_data)
to a file would suffice.