Hello,
I have an experiment with paired samples (normal vs tumour). These are biopsies from patients, one biopsy from normal tissue and another biopsy from tumor tissue. These patients have different response to a treatment (respuesta, recidiva). I would like to find which are the differential expressed genes between respuesta and recidiva but accounting for the paired samples
Also, I would like to compare the DEGs in those patients in respuesta status but depending on the treatment, i.e, which are the genes differentially expressed between treatments in those patients that are responders (patients with respuesta).
I have tried to get the design matrix for the first question (respuesta vs recidiva) but I got an error that the matrix is not full rank. I have followed examples from limma and edgeR but I am missing something.
Also I would like to know how should be the design matrix and the contrasts for the second analyses: differences between treatments. My number of different treatments is three, although in the below example there are only two different.
head(summarydata, n=12)
sample tissue names respuesta tratamiento Proyect files
1 04 N 04S RESPUESTA CIR+RT fase1 salmon_results/totalRNA/04S/quant.sf
2 04 T 04T RESPUESTA CIR+RT fase1 salmon_results/totalRNA/04T/quant.sf
3 10 N 10S RESPUESTA CIR fase1 salmon_results/totalRNA/10S/quant.sf
4 10 T 10T RESPUESTA CIR fase1 salmon_results/totalRNA/10T/quant.sf
5 11FIDIS N 11S-FIDIS RECIDIVA CIR fase1 salmon_results/totalRNA/11S-FIDIS/quant.sf
6 11FIDIS T 11T-FIDIS RECIDIVA CIR fase1 salmon_results/totalRNA/11T-FIDIS/quant.sf
7 12 N 12S RESPUESTA CIR fase1 salmon_results/totalRNA/12S/quant.sf
8 12 T 12T RESPUESTA CIR fase1 salmon_results/totalRNA/12T/quant.sf
9 12FIDIS N 12S-FIDIS RESPUESTA CIR+RT fase1 salmon_results/totalRNA/12S-FIDIS/quant.sf
10 12FIDIS T 12T-FIDIS RESPUESTA CIR+RT fase1 salmon_results/totalRNA/12T-FIDIS/quant.sf
11 13FIDIS N 13S-FIDIS RECIDIVA CIR fase1 salmon_results/totalRNA/13S-FIDIS/quant.sf
12 13FIDIS T 13T-FIDIS RECIDIVA CIR fase1 salmon_results/totalRNA/13T-FIDIS/quant.sf
And I tried these two design matrices:
y <- makeDGEList(gse) # gse is a summarizedExperiment created using *tximeta*
# design matrix for running a first comparison Respuesta vs Recidiva
Subject <- factor(summarydata$sample)
Treat <- factor(summarydata$respuesta, levels=c("RESPUESTA","RECIDIVA"))
tissue <- factor(summarydata$tissue, levels=c("N", "T"))
# 1st try
design1 <- model.matrix(~Subject+tissue+Treat)
rownames(design1) <- colnames(y)
# 2nd try
design2 <- model.matrix(~Subject+Treat)
rownames(design2) <- colnames(y)
And this is the error I get when using any of the two above mentioned design matrices:
Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05, :
Design matrix not of full rank. The following coefficients not estimable:
TreatRECIDIVA
Many thanks in advance
Dear Gordon Smyth, thanks a lot for your help, I have tried to follow the steps for both edgeR and limma, and for limma I get his error, I have followed a voom-limma approach. Maybe I am misunderstanding something very obvious
The error that I got is this:
Error in contrasts.fit(fit, contrasts) : anyNA() applied to non-(list or vector) of type 'closure'
In case you need the summarydata file, you can find it here summarydata.txt
In addition, I have also tried the edgeR approach but still got the same error of matrix not of full rank. Here are the steps I am doing
In the limma approach, you've simply made a typo. You intended
instead of
By the way, you can simplify the code by using
in which case
voom
andduplicateCorrelation
will be run automatically.In the edgeR approach, you have mixed up the within patient and between patient effects. You need to create two effects, one for each patient group. You can't create separate effects for tumor and normal because tumor and normal are within-patient conditions. You need instead
Thanks a lot Gordon Smyth, I didn't try the edgeR approach but with the Limma one it worked!!