Entering edit mode
6.5 years ago
Leite
★
1.3k
Hello everyone,
I tried to help on a post one month ago, but without success.
I continue to have problems analyzing the HumanHT-12 chip, I'm trying to follow this code by @Gordon Smyth.
My biggest problem is to split the groups, how to tell the R what Healthy controls and the patients.
> library(limma)
> x <- read.ilmn("GSE74629_non-normalized.txt",expr="SAMPLE ",probeid="ID_REF")
Reading file GSE74629_non-normalized.txt ... ...
> y <- neqc(x)
Note: inferring mean and variance of negative control probe intensities from the
detection p-values.
> Group <- rep(c("PDAC","Healthy"),c(36,14))
> Group <- factor(Group)
> design <- model.matrix(~Group)
> keep <- rowSums(y$E>5) >= 14
> y2 <- y[keep,]
> fit <- lmFit(y2,design)
> fit <- eBayes(fit,trend=TRUE,robust=TRUE)
> topTable(fit,coef=2)
Can I use something like targets <- readTargets ("targets.txt")
? and then say what are the controls and patients:
#Build the design matrix for the linear modelling function
f <- factor(targets$Target, levels = unique(targets$Target))
design <- model.matrix(~0 + f)
colnames(design) <- levels(f)
#Create a contrast matrix
contrast.matrix <- makeContrasts("patients-control", levels=design)
Thank you all,
Leite
Yes you can. Take note that
topTable()
is deprecated in favour oftopTreat()
- the post you are following is quite old, I would suggest you follow the limma Users Guide instead.Hey h.mon, ty so much. I'll try to do this. the difference
topTreat()
andtopTable()
is that intopTreat
I can use more parameters to filter, or have more things?The difference is
topTreat()
better controls false discovery rate, read the Note section on?topTreat
.Thank you so much again.
Just to add: The example that Gordon posted (on Bioconductor) produces the same type of model matrix that we normally create via information supplied via
readTargets
. One can create the model matrix in any way, though. You do not necessarily have to usereadTargets
.For limma, for any array source, you just need to get your sample files into a data matrix, with samples as columns and genes as rows. Then, you just need to specify a model matrix (via any means) that defines your sample groupings.
Be aware, also, that Gordon may have only be supplying sample code and that the line
Group <- rep(c("PDAC","Healthy"),c(36,14))
, i.e., 36 PAC versus 14 controls, may not reflect the actual numbers of these groups in the datasetHey Kevin,
Thank you so much for your reply. I decided to use the normalized data, it seemed to be the best and easiest way to analyze this data. I made a breakthrough this weekend and my R code is like this:
I'm studying to see if there's any mistake in it, what do you guys think?
Best,
Leite
Boa tade Leite, and how do the p-values appear? A good measure is to take the most differentially expressed genes and see if they separate your groups in clstering + heatmap.
Boa tarde Kevin,
I can find many genes with an adjusted p <0.05. I'll try what you mentioned. What does not make me very comfortable with this code is this function
eset <- readExpressionSet
I'm still studying more about it.That function appears to just 'coerce' (convert) a data matrix into an Expression Set object, which is require for Limma.
The only other thing that you need to ensure is that your column names (in
eset
) are in the same order as your samples intargets
.Oi Kevin,
This point I had not thought of. "The only other thing that you need to ensure is that your column names (in
eset
) are in the same order as your samples intargets
."Thank you very much for this super tip, it made all the difference now.