Entering edit mode
3.5 years ago
shanasabri
▴
40
I have sample metadata set up as:
> metadata
Protocol Donor Condition
A TruSeq 1629 A
B TruSeq 1629 B
C TruSeq 1629 C
D TruSeq 1630 A
E TruSeq 1630 B
F TruSeq 1630 C
G Tn5 1629 A
H Tn5 1629 B
I Tn5 1629 C
J Tn5 1630 A
K Tn5 1630 B
L Tn5 1630 C
M UII 1629 A
N UII 1629 B
O UII 1629 C
P UII 1630 A
Q UII 1630 B
R UII 1630 C
I'd like to test for differences in Protocols while correcting for baseline Donor and Condition effects. How exactly can EdgeR do this? Does this need to be computed for each pairwise comparison (Truseq vs Tn5, Truseq vs UII, etc) or can EdgeR compute a sort of 1 versus all other comparison (Truseq vs {TN5, UII})?
My current workflow looks something like this:
y <- DGEList(counts = raw_counts_df)
keep <- filterByExpr(y)
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
design <- model.matrix(~Protocol + Donor + Condition, data = metadata)
y <- estimateDisp(y, design)
fit <- glmQLFit(y, design)
# qlf <- glmQLFTest(fit, coef=????) or qlf <- glmQLFTest(fit, contrast=????)
Any help would be greatly appreciated!
In your case, the two variable donor and condition are factors (categorical data) and not covariates (continious), so u can use both
Both are same for multifactor analysis like urs. So u can go for the second option.
After the design, create a contrast of the pairs u want to compare.
Hope this helps.
You can refer this to know more.
To answer to your query 1 vs rest, read section 5.4
If I specify
results <- glmQLFTest(fit, contrast = contr_mtx)
then I'll get a results table with the following columns:"logFC.truevstn5" "logFC.truevsUII" "logFC.UIIvstn5" "logCPM" "F" "PValue"
So I assume this is the correct way to get pairwise combos. I don't seem to have a P-value associated with each comparison. What does the PValue column correspond to in this case? Some average?I'm looking through the documentation and don't really understand how to do a 1 vs all other comparison, for example True vs Tn5+UII. Any additional insights?
Hmm the code above errors out:
Got it.
U are using glmQLFit and not lmFit The contrasts.fit expects Linear Model Fit
So what you can do is pass contrast while fitting the glm, then followed by Quasi-likelihood Tests
Hope this helps.
Got it. That works but I'm still unclear on how to extrapolate protocol-specific genes. The output of results will give me a list of differential genes but none seem to be enriched in a given protocol, rather they seem enriched by condition.. Here's a volcano plot of the results. It seems a top diffexp gene is CPLV. When I look at this gene I see it's enriched in a given condition, not protocol.