About the sample: There are 40 samples in which 10 samples are healthy controls and 30 samples are diseased (10 of type-1 + 10 of type-2 + 10 of type-3). I wish to perform differential expression analysis between the 30 diseased and 10 controls, i.e. 30_diseased_vs_10_control. The accession id of the sample is GSE65127, through which I have retrieved the expression-set (using exprs(x)
), pheno-data (using pData(x)
) and feature-data (using fData(x)
) respectively.
Reproducible Code for differential expression that I am using
e_data <- read.csv("GSE65127_expression_set.csv", row.names = 1)
p_data <- read.csv("GSE65127_pheno_set.csv",row.names = 1)
f_data <- read.csv("GSE65127_feature_set.csv",row.names = 1)
design_labels <- p_data$source_name_ch1
design_factors <- factor(design_labels)
design <- model.matrix(~0+design_factors)
colnames(design) <- levels(design_factors)
fit <- lmFit(e_data, design)
cont.matrix <- makeContrasts(disease_vs_healthy = "(Lesional+Non.Lesional+Peri-Lesional)-(Healthy.Volunteer)",levels=design)
Error: While making the cont.matrix
, I am getting the following message
Error in makeContrasts(disease_vs_healthy = "(Lesional+Non.Lesional+Peri-Lesional)-(Healthy.Volunteer)", :
The levels must by syntactically valid names in R, see help(make.names). Non-valid names: Healthy Volunteer,Non Lesional,Peri-Lesional
I am using this workflow: https://www.bioconductor.org/help/course-materials/2005/BioC2005/labs/lab01/estrogen/
Rewrite your
design_factors
; eg, usedesign_factors <- factor(make_names(design_labels))
. The column names of your design matrix are not the same as the levels used in your makeContrasts string.Hi! thank you, it worked. However, after running the testing with the following code
cont.matrix <- makeContrasts(disease_vs_healthy = (Lesional+Non.Lesional+Peri.Lesional)-(Healthy.Volunteer),levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
results <- decideTests(fit2)
summary(results)
All my genes are coming to be upregulated in
summary()
disease_vs_healthy
Down 0
NotSig 0
Up 54675
When I do the same testing with
Geo2R
, I get significant results.Is that surprising though? Your code works, but it is wrong. Let's suppose I'm a gene that is expressed at the same level in your four groups. Say my fitted value for treatment-group G, is
x + (noise_G)
. With your contrast as currently defined, the value of your contrast would be (x + noise_lesional) + (x + noise_non_lesional) + (x + noise_peri_lesional) - (x + noise_healthy) = 2x + some noise. If x is appreciably bigger than the inter-group noise, this contrast will be significant. Fix your contrast.Remove the quotation marks within the contrast function and please show the content of
design
.Hi! Thanks for the quick reply, I have tried by removing the
""
but still got the same errorHere is the
design
> design
Healthy Volunteer Lesional Non Lesional Peri-Lesional
1 1 0 0 0
2 1 0 0 0
3 1 0 0 0
4 1 0 0 0
5 1 0 0 0
6 1 0 0 0
7 1 0 0 0
8 1 0 0 0
9 1 0 0 0
10 1 0 0 0
11 0 1 0 0
12 0 1 0 0
13 0 1 0 0
14 0 1 0 0
15 0 1 0 0
16 0 1 0 0
17 0 1 0 0
18 0 1 0 0
19 0 1 0 0
20 0 1 0 0
21 0 0 1 0
22 0 0 1 0
23 0 0 1 0
24 0 0 1 0
25 0 0 1 0
26 0 0 1 0
27 0 0 1 0
28 0 0 1 0
29 0 0 1 0
30 0 0 1 0
31 0 0 0 1
32 0 0 0 1
33 0 0 0 1
34 0 0 0 1
35 0 0 0 1
36 0 0 0 1
37 0 0 0 1
38 0 0 0 1
39 0 0 0 1
40 0 0 0 1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$design_factors
[1] "contr.treatment"
I am new to Biostars. Hope this is the correct format for pasting matrix results