Hey guys,
I am new to DESeq2 and I am trying to following a template on , and I am trying to analyze the bacterial genes that are differentially expressed in tumor vs normal tissue (each tumor and normal tissue are paired). There are 12 patients, therefore 12 tumors and 12 matching normal tissues in total. When I did an unpaired analysis, I found 60 genes are differentially expressed but when I did paired analysis, no gene is differentially expressed. Can anyone tell me whether I am doing this right? The following is the script:
Import data from featureCounts
Previously ran at command line something like this:
featureCounts -a genes.gtf -o counts.txt -T 12 -t exon -g gene_id GSM*.sam
countdata <- read.table("counts.txt", header=TRUE, row.names=1)
Remove first five columns (chr, start, end, strand, length)
countdata <- countdata[ ,6:ncol(countdata)]
Remove .bam or .sam from filenames
colnames(countdata) <- gsub("\.[sb]am$", "", colnames(countdata))
Convert to matrix
countdata <- as.matrix(countdata) head(countdata)
Assign condition (first four are controls, second four contain the expansion)
(condition <- factor(c(rep("ctl", 12), rep("exp", 12))))
Analysis with DESeq2 ----------------------------------------------------
library(DESeq2)
Create a coldata frame and instantiate the DESeqDataSet. See ?DESeqDataSetFromMatrix
(coldata <- data.frame(row.names=colnames(countdata), condition)) dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition) dds class: DESeqDataSet dim: 1997 24 metadata(0): assays(1): counts rownames(1997): FN1496 FN1497 ... FN1493 FN1494 rowRanges metadata column names(0): colnames(24): FusoSRR317086 FusoSRR317087 ... FusoSRR317108 FusoSRR317109 colData names(1): condition
add patient information
dds$patient=c("p12","p7","p1","p4","p8","p6","p5","p11","p2","p9","p10","p3","p12","p7","p8","p4","p6","p10","p5","p2","p11","p1","p3","p9") (coldata <- data.frame(row.names=colnames(countdata), condition, dds$patient)) dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~ condition + dds.patient) dds class: DESeqDataSet dim: 1997 24 metadata(0): assays(1): counts rownames(1997): FN1496 FN1497 ... FN1493 FN1494 rowRanges metadata column names(0): colnames(24): FusoSRR317086 FusoSRR317087 ... FusoSRR317108 FusoSRR317109 colData names(2): condition dds.patient
Thank you guys so much!
Could you post your design matrix, please; and explain why you use 'dds.patient' rather than 'patient' when setting it up
Do you mean this? DESeqDataSet dim: 1997 24 metadata(0): assays(1): counts rownames(1997): FN1496 FN1497 ... FN1493 FN1494 rowRanges metadata column names(0): colnames(24): FusoSRR317086 FusoSRR317087 ... FusoSRR317108 FusoSRR317109 colData names(2): condition dds.patient
No, I mean, could you please post it's structure, so that I can tell from first principles whether there is some error in the way you are setting up the paired analysis. That is, what does the following look like:
(Intercept) conditionexp dds.patientp10 dds.patientp11 dds.patientp12 dds.patientp2 dds.patientp3 dds.patientp4 dds.patientp5 dds.patientp6 dds.patientp7 dds.patientp8 dds.patientp9 FusoSRR317086 1 0 0 0 1 0 0 0 0 0 0 0 0 FusoSRR317087 1 0 0 0 0 0 0 0 0 0 1 0 0 FusoSRR317088 1 0 0 0 0 0 0 0 0 0 0 0 0 FusoSRR317089 1 0 0 0 0 0 0 1 0 0 0 0 0 FusoSRR317090 1 0 0 0 0 0 0 0 0 0 0 1 0 FusoSRR317091 1 0 0 0 0 0 0 0 0 1 0 0 0 FusoSRR317092 1 0 0 0 0 0 0 0 1 0 0 0 0 FusoSRR317093 1 0 0 1 0 0 0 0 0 0 0 0 0 FusoSRR317094 1 0 0 0 0 1 0 0 0 0 0 0 0 FusoSRR317095 1 0 0 0 0 0 0 0 0 0 0 0 1 FusoSRR317096 1 0 1 0 0 0 0 0 0 0 0 0 0 FusoSRR317097 1 0 0 0 0 0 1 0 0 0 0 0 0 FusoSRR317098 1 1 0 0 1 0 0 0 0 0 0 0 0 FusoSRR317099 1 1 0 0 0 0 0 0 0 0 1 0 0 FusoSRR317100 1 1 0 0 0 0 0 0 0 0 0 1 0 FusoSRR317101 1 1 0 0 0 0 0 1 0 0 0 0 0 FusoSRR317102 1 1 0 0 0 0 0 0 0 1 0 0 0 FusoSRR317103 1 1 1 0 0 0 0 0 0 0 0 0 0 FusoSRR317104 1 1 0 0 0 0 0 0 1 0 0 0 0 FusoSRR317105 1 1 0 0 0 1 0 0 0 0 0 0 0 FusoSRR317106 1 1 0 1 0 0 0 0 0 0 0 0 0 FusoSRR317107 1 1 0 0 0 0 0 0 0 0 0 0 0 FusoSRR317108 1 1 0 0 0 0 1 0 0 0 0 0 0 FusoSRR317109 1 1 0 0 0 0 0 0 0 0 0 0 1 attr(,"assign") [1] 0 1 2 2 2 2 2 2 2 2 2 2 2 attr(,"contrasts") attr(,"contrasts")$condition [1] "contr.treatment" attr(,"contrasts")$dds.patient [1] "contr.treatment"
Thank you so much russhh! This is what pops up after I put in your command:
Sorry for posting an image, as the output exceeded the word limits. Thanks again!
Thanks, the matrix looks fine. When you run results(dds), does it state "log2 fold change (MAP): condition exp vs ctl"? If not you might need to define contrasts or rewrite the design as '~ dds.patient + condition' (for consistency with the DESeq2 vignette). Otherwise, it might just be that you have no significant genes at your chosen cutoff (perhaps check some of the 60 genes that were putative diffexes when ran without the patient matching visually).
Thanks russhh! That`s a good suggestion! I will try that:)
Thanks again for the suggestion, russhh! It is now working (rewrite the design as '~ dds.patient + condition')! Thank you so much! So is it statistically correct to put dds.patient in front of condition?
I wouldn't say it's the 'statistically correct' thing to do. It's pure implementation: DESeq2 must use the (coef for the) final column of the design matrix as it's default contrast coefficient (ie, the thing that is compared against zero in your hypothesis test). The information contained in the two design matrices is exactly the same. There'll be details on how to specify more complicated designs and alternative contrasts within the vignette for DESeq2