Hi,
I have rnaseq (some disease) samples from 7 families and my sample info is described below (affected=A, normal=N):
sample,family,condition,sex
AB1,1,A,F
AB2,1,N,F
AB3,2,A,M
AB4,2,N,F
AB5,3,A,M
AB6,3,N,M
AB7,3,N,F
AB8,4,A,M
AB9,4,N,F
AB10,5,A,F
AB11,5,N,M
AB12,5,N,F
AB13,6,A,M
AB14,6,N,M
AB15,6,N,F
AB16,7,A,M
AB17,7,N,F
I am interested in the following comparisons:
1) condition_A_vs_N
2) All pairwise comparisons between affected samples from different families: A1_vs_A2, A1_vs_A3, .... (16 comparisons/DEGs lists).
3) Comparisons between affected samples from different families vs overall normal: A1_vs_N, A2,_vs_N, ...(7 DEGs lists)
I have tried:
1)
dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ family + condition )
dds$group <- factor(paste0(dds$condition,dds$family))
design(dds) <- ~ group
dds <- DESeq(dds)
resultsNames(dds)
[1] "Intercept" "group_N2_vs_N1" "group_N3_vs_N1" "group_N4_vs_N1" "group_N5_vs_N1" "group_N6_vs_N1" "group_A1_vs_N1" "group_A2_vs_N1"
[9] "group_A3_vs_N1" "group_A4_vs_N1" "group_A5_vs_N1" "group_A6_vs_N1"
If I use contrast option, I can get 2. (one of the comparisons as an example), but not 1 and 3:
results(dds, contrast=c("group", "A1", "A2"))
log2 fold change (MLE): group A1 vs A2
Wald test p-value: group A1 vs A2
....................
2)
dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ family + condition + family:condition)
dds <- DESeq(dds)
resultsNames(dds)
[1] "Intercept" "family_2_vs_1" "family_3_vs_1" "family_4_vs_1" "family_5_vs_1" "family_6_vs_1" "condition_A_vs_N"
[8] "family2.conditionA" "family3.conditionA" "family4.conditionA" "family5.conditionA" "family6.conditionA"
Here, I get 1 (condition_A_vs_N), but not 2 and 3.
Do I need to use different design models to get the required comparisons? Please guide how should I proceed so that I can get 1, 2 and 3.
Also, as per manual, "The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables." Here I have more than 3 levels, should I go for LRT?
Thanks.
Your comparison 2 is essentially a series of of 1 replicate vs one replicate comparisons, and I would almost always recommend against it. Replicates are necessary because it is necessary to estimate the dispersion for gene expression estimates. In theory it might be possible to estimate dispersion by assuming no differential expression, and effectively use all 16 samples simultaneously to estimate dispersion, but I don't know if DESeq2 can do this, I know that edgeR used to be able to do this, but this seems sub-optimal.
The situation is a little better in the case of your comparison three, because one of the groups has replicates, while the other doesn't, and it should be possible to estimate dispersion from the family corrected normals and assume that this disperision is the same for the affecteds, but its still not ideal.
What is the research questions you are trying to answer? Why do you need to know about the families independently?