Hello,
I have rnaseq data generated from brain tissues of 100 individuals, half of them were affected by a disease. All the samples were mix of two different brain regions, obtained from 4 different providers. I am trying to run DESeq2 to identify differentially expressed genes using the model like: ~age + sex + tissue + PMI + condition, where PMI is post-mortem interval. My colData looks like this:
condition age sex tissue PMI RIN brain_bank
A016-93 disease 79 M FCBA89 48.00 6.400000 LND
A032-95 disease 71 F FCBA89 65.00 6.550000 LND
A046-94 disease 70 F FCBA89 100.00 5.400000 LND
AN00216 disease 63 M FCBA89 26.16 6.150000 KBC
AN02738 disease 85 F FCBA89 12.80 6.200000 KBC
SD001-07 disease 37 F IF 46.00 6.200000 MII
.
.
.
AN03019 control 47 F FCBA89 31.85 5.300000 KBC
SD004-10 control 50 M IF 63.00 5.400000 MII
A046-91 control 41 M IF 49.00 5.450000 LND
Here condition, sex, tissue, brain_bank are as factors and age, RIN, PMI are numeric variables. The RIN values are averaged values because all the samples were sequenced in duplicates.
I want to include brain_bank as a covariate in the model but when I this: ~age + sex + tissue + brain_bank + PMI + condition, it gives me the following error:
the design formula contains a numeric variable with integer values,
specifying a model with increasing fold change for higher values.
did you mean for this to be a factor? if so, first convert
this variable to a factor using the factor() function
Show Traceback
Rerun with Debug
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
I have seen this error several times but being novice in statistics I am never able to understand this error. If anyone can suggest a good reading material about it, would be great.
Anyways following are my questions:
1) Is there some way to incorporate brain_bank as a covariate in the analysis or I should simply go for the first model?
2) Should I use mean RIN values in the model to account for the effect? Any other suggestions?
3) Not sure if this should asked as a separate question. I also want to run DEXseq in order to find diff. exon usage between cases and controls. So can I use the following model:
~sample + exon + exon:condition + age:exon + sex:exon + tissue:exon + PMI:exon ## Full formula
~sample + exon + age:exon + sex:exon + tissue:exon + PMI:exon ## Reduced formula
Thanks in advance for helping.
Are you sure that age is incorporated as a factor in the model ? I doubt it given this warning message :
the design formula contains a numeric variable with integer values.
No my mistake age is added as numeric. I made correction in the post.
Hello fizer!
It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/90778/
This is typically not recommended as it runs the risk of annoying people in both communities.
Hi Wouter,
I was not getting any feedback earlier so I thought may be I should go to another site. I was not aware if cross posting could be annoying. Thanks