Hi everyone,
I have a query related to design matrix in Deseq2. I have RNAseq sample from multiple labs. So this is basically a batch effect.
Batch: 5 labs : 29 samples. Each has atleast two replicates
RunType: Some are single end some are paired end
Condition: Some are wild type and some are knockout.
How to make a design matrix in Deseq2?
I thought for some possible combinations:
design <- model.matrix(~Batch + Condition + RunType)
design <- model.matrix(~Batch + Condition:Batch + RunType:Batch)
design <- model.matrix(~Condition + Batch + RunType)
design <- model.matrix(~Condition + Batch: Condition + RunType: Condition)
Which one is correct to remove any batch effect present?
Or any other possible combinations which I am missing. I am not sure how to model the design for such three possible factors.
I am not sure if I have to perform interaction also between three factors.
Please help.
Thanks
Impossible for anyone to know, really. Each experiment is unique and much 'back and forth' in the analysis is required.
I would start with:
Then, check PCA bi-plots for each variable in the design. If, for example,
RunType
has no apparent effect, then remove it from the model.I don't see any need for an interaction term, in this case.
I think you can merge Batch and RunType information together, that would be you batch effect and then use ~Batch+Condition
Do you have replicates of each condition in every of the batches? If not then I doubt you can (or should) try to compare these. At least treat everything as single-end, this then eliminates that confounder. Can you post a table that indicates which sample is which condition and from which lab it comes?
No.In each condition of batches no. Is there any alternative to resolve such limitations? here is the table
The other possibility I thought is to look at some sets of genes like 50-60 genes . These genes are like markers of cell type E and N. I want to see the expression of only these genes by heatmap in my collected datasets. I want to normalize the normalized counts by loading control expression .. like beta actin. Then compare only the expression of these genes across the samples. This way atleast I can see how.my ko are different from wt for these specific marker genes and may reveal if ko has some kind of impaired expression of marker genes. Does it make some sense?
Please help