Import data from featureCounts

Question

DESeq2 paired analysis

2

Entering edit mode

8.2 years ago

dazhudou1122 ▴ 140

Hey guys,

I am new to DESeq2 and I am trying to following a template on , and I am trying to analyze the bacterial genes that are differentially expressed in tumor vs normal tissue (each tumor and normal tissue are paired). There are 12 patients, therefore 12 tumors and 12 matching normal tissues in total. When I did an unpaired analysis, I found 60 genes are differentially expressed but when I did paired analysis, no gene is differentially expressed. Can anyone tell me whether I am doing this right? The following is the script:

Import data from featureCounts

Previously ran at command line something like this:

featureCounts -a genes.gtf -o counts.txt -T 12 -t exon -g gene_id GSM*.sam

countdata <- read.table("counts.txt", header=TRUE, row.names=1)

Remove first five columns (chr, start, end, strand, length)

countdata <- countdata[ ,6:ncol(countdata)]

Remove .bam or .sam from filenames

colnames(countdata) <- gsub("\.[sb]am$", "", colnames(countdata))

Convert to matrix

countdata <- as.matrix(countdata) head(countdata)

Assign condition (first four are controls, second four contain the expansion)

(condition <- factor(c(rep("ctl", 12), rep("exp", 12))))

Analysis with DESeq2 ----------------------------------------------------

library(DESeq2)

Create a coldata frame and instantiate the DESeqDataSet. See ?DESeqDataSetFromMatrix

(coldata <- data.frame(row.names=colnames(countdata), condition)) dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition) dds class: DESeqDataSet dim: 1997 24 metadata(0): assays(1): counts rownames(1997): FN1496 FN1497 ... FN1493 FN1494 rowRanges metadata column names(0): colnames(24): FusoSRR317086 FusoSRR317087 ... FusoSRR317108 FusoSRR317109 colData names(1): condition

add patient information

dds$patient=c("p12","p7","p1","p4","p8","p6","p5","p11","p2","p9","p10","p3","p12","p7","p8","p4","p6","p10","p5","p2","p11","p1","p3","p9") (coldata <- data.frame(row.names=colnames(countdata), condition, dds$patient)) dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~ condition + dds.patient) dds class: DESeqDataSet dim: 1997 24 metadata(0): assays(1): counts rownames(1997): FN1496 FN1497 ... FN1493 FN1494 rowRanges metadata column names(0): colnames(24): FusoSRR317086 FusoSRR317087 ... FusoSRR317108 FusoSRR317109 colData names(2): condition dds.patient

Thank you guys so much!

RNA-Seq R • 6.0k views

ADD COMMENT • link 8.2 years ago by dazhudou1122 ▴ 140

1

Entering edit mode

Could you post your design matrix, please; and explain why you use 'dds.patient' rather than 'patient' when setting it up

ADD REPLY • link 8.2 years ago by russhh 5.7k

0

Entering edit mode

Do you mean this? DESeqDataSet dim: 1997 24 metadata(0): assays(1): counts rownames(1997): FN1496 FN1497 ... FN1493 FN1494 rowRanges metadata column names(0): colnames(24): FusoSRR317086 FusoSRR317087 ... FusoSRR317108 FusoSRR317109 colData names(2): condition dds.patient

ADD REPLY • link 8.2 years ago by dazhudou1122 ▴ 140

0

Entering edit mode

No, I mean, could you please post it's structure, so that I can tell from first principles whether there is some error in the way you are setting up the paired analysis. That is, what does the following look like:

model.matrix(~ condition + dds.patient, data = coldata)

ADD REPLY • link 8.2 years ago by russhh 5.7k

0

Entering edit mode

(Intercept) conditionexp FusoSRR317086 1 FusoSRR317087 1 FusoSRR317088 1 FusoSRR317089 1 FusoSRR317090 1 FusoSRR317091 1 FusoSRR317092 1 FusoSRR317093 1 FusoSRR317094 1 FusoSRR317095 1 FusoSRR317096 1 FusoSRR317097 1 FusoSRR317098 1 FusoSRR317099 1 FusoSRR317100 1 FusoSRR317101 1 FusoSRR317102 1 FusoSRR317103 1 FusoSRR317104 1 FusoSRR317105 1 FusoSRR317106 1 FusoSRR317107 1 FusoSRR317108 1 FusoSRR317109 1 attr(,"assign") [1] 0 1 2 2 2 2 2 2 2 2 2 2 2 attr(,"contrasts") attr(,"contrasts")$condition [1] "contr.treatment" attr(,"contrasts")$dds.patient [1] "contr.treatment" dds.patientp10 dds.patientp11 dds.patientp12 dds.patientp2 dds.patientp3 dds.patientp4 dds.patientp5 dds.patientp6 dds.patientp7 dds.patientp8 dds.patientp9 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1

ADD REPLY • link 8.2 years ago by dazhudou1122 ▴ 140

0

Entering edit mode

enter image description here

ADD REPLY • link 8.2 years ago by dazhudou1122 ▴ 140

0

Entering edit mode

Thank you so much russhh! This is what pops up after I put in your command: enter image description here

Sorry for posting an image, as the output exceeded the word limits. Thanks again!

ADD REPLY • link 8.2 years ago by dazhudou1122 ▴ 140

0

Entering edit mode

Thanks, the matrix looks fine. When you run results(dds), does it state "log2 fold change (MAP): condition exp vs ctl"? If not you might need to define contrasts or rewrite the design as '~ dds.patient + condition' (for consistency with the DESeq2 vignette). Otherwise, it might just be that you have no significant genes at your chosen cutoff (perhaps check some of the 60 genes that were putative diffexes when ran without the patient matching visually).

ADD REPLY • link 8.2 years ago by russhh 5.7k

0

Entering edit mode

Thanks russhh! That`s a good suggestion! I will try that:)

ADD REPLY • link 8.2 years ago by dazhudou1122 ▴ 140

0

Entering edit mode

Thanks again for the suggestion, russhh! It is now working (rewrite the design as '~ dds.patient + condition')! Thank you so much! So is it statistically correct to put dds.patient in front of condition?

ADD REPLY • link 8.2 years ago by dazhudou1122 ▴ 140

1

Entering edit mode

I wouldn't say it's the 'statistically correct' thing to do. It's pure implementation: DESeq2 must use the (coef for the) final column of the design matrix as it's default contrast coefficient (ie, the thing that is compared against zero in your hypothesis test). The information contained in the two design matrices is exactly the same. There'll be details on how to specify more complicated designs and alternative contrasts within the vignette for DESeq2

ADD REPLY • link 8.2 years ago by russhh 5.7k