Hi Folks,
I am currently working on 16 bulk RNAseq samples. I did some basic research on statistical testing of differential expression and figured out two R packages to edgeR and DESeq2.
The 16 samples are actually paired samples data from before treatment and after treatment from 8 patients. I am interested in finding DE genes within paired samples for instance (P1_AT vs P1_BT). This is the design matrix
Sample Patient Treatment
P1_BT P1 BT
P1_AT P1 AT
P2_BT P2 BT
P2_AT P2 AT
........
P8_BT P8 BT
P8_AT P8 AT
So I have created my design matrix like section 4.1
- design <- model.matrix(~ Patient+Treatment)
EdgeR manual, my experiment design is exactly similar to section 4.1 of edgeR manual.
4.1 RNA-Seq of oral carcinomas vs matched normaltissue
Patient <- factor(c(8,8,33,33,51,51))
Tissue <- factor(c("N","T","N","T","N","T"))
data.frame(Sample=colnames(y),Patient,Tissue)
Sample Patient Tissue
8N 8 N
8T 8 T
33N 33 N
33T 33 T
51N 51 N
51T 51 T
design <- model.matrix(~Patient+Tissue)
rownames(design) <- colnames(y)
design
(Intercept) Patient33 Patient51 TissueT
8N 1 0 0 0
8T 1 0 0 1
33N 1 1 0 0
33T 1 1 0 1
51N 1 0 1 0
51T 1 0 1 1
fit <- glmFit(y, design)
lrt <- glmLRT(fit)
topTags(lrt)
Coefficient: TissueT
RefSeqID Symbol NbrOfExons logFC logCPM LR PValue FDR
5737 NM_001039585 PTGFR 4 -5.18 4.74 98.7 2.97e-23 3.12e-19
5744 NM_002820 PTHLH 4 3.97 6.21 92.2 8.00e-22 4.21e-18
3479 NM_001111283 IGF1 5 -3.99 5.71 86.5 1.38e-20 4.85e-17
colnames(design)[1] "(Intercept)" "Patient33" "Patient51" "TissueT"
But I am interested in getting 1) P1_AT vs P1_BT, 2) P2_AT vs P2_BT ... 8) P8_AT vs P8_BT
To find genes with baseline differences between the drug and the placebo at 0 hours:
qlf <- glmQLFTest(fit, contrast=my.contrasts[,"DrugvsPlacebo.0h"])
my.contrasts <- makeContrasts(
+ Drug.1vs0 = Drug.1h-Drug.0h,
+ Drug.2vs0 = Drug.2h-Drug.0h,
+ Placebo.1vs0 = Placebo.1h-Placebo.0h,
+ Placebo.2vs0 = Placebo.2h-Placebo.0h,
+ DrugvsPlacebo.0h = Drug.0h-Placebo.0h,
+ DrugvsPlacebo.1h = (Drug.1h-Drug.0h)-(Placebo.1h-Placebo.0h),
+ DrugvsPlacebo.2h = (Drug.2h-Drug.0h)-(Placebo.2h-Placebo.0h),
+ levels=design)
The design matrix is different for section 4.1 and section 3.3.1. My experiment similar to section 4.1, so I have used that design matrix. But I want the results of section 3.3.1 within samples comparison.
Any comments are appreciated.
Thanks, Kevin for helping me out. I have made some changes to the original post by replacing the screenshots and changing in the design matrix from "Time" to "Treatment".
I want to compare and find DE genes within paired samples like
"P1_AT vs P1_BT", "P2_AT vs P2_BT" and so on
?In the above post, two sections from edgeR (section 3.3.1-Multifactor analysis and section 4.1-Paired).
From section4.1, I need some help in understanding the output. f
colnames(design)[1] "(Intercept)" "Patient33" "Patient51" "TissueT"
This is my understanding, TissueT refers to comparison of all tumor samples (8T, 33T, 51T) vs all normal samples (8N, 33N, 51N). Am I correct?
What about Patient33, does it mean Patient33 (N and T) vs Patient8 (N and T)?
What about Patient51, does it mean Patient51 (N and T) vs Patient8 (N and T)?
In order to compare, Patient51 N vs Patient51 T, what design and contrast should be selected?
Yes, this is correct. However, to add, due to the fact that
Patient
is also included in the design formula (~ Patient + Tissue
), the results that you derive for TissueT are controlled for the Tumor-Normal pairing, i.e., it is already a paired analysis.No, these do not have much useful meaning, in this situation. 'Patient33' would be: Patient33 (T+N) vs [Patient51 (T+N) & Patient8 (T+N)]
You should not derive p-values from a 1 versus 1 comparison. It would be better to use TissueT, which will give you stats for Tumour versus Normal, while controlling for patient-specific effects.
Hi Kevin
I have a very similar situation and also interested in getting the below kind of comparison for my dataset.
Took the content from edgeR manual,
I want to compare Subject 1_C vs Subject 1_T, Subject 2_C vs Subject 2_T and Subject 3_C vs Subject 3_T. Therefore I created the new design.
setting dispersion to NA (bcos there are no replicates) but in the attached screenshot from section 3.4.1 (Paired Samples).
Both the control and the treatment are administered to each of three patients. They are comparing the treatment to the control for each patient separately.
How to compare them separately?
What mistake I have done?