Hello,
I will be working with approximately 40 bulk RNA-seq samples that include one treatment factor with four levels. However, due to logistical constraints, RNA extraction will be performed on, for example, either Monday or Tuesday, and sequencing library preparation will occur on either Thursday or Friday. Each of these steps will require two days to complete.
I wanted to avoid having the treatment factor confounded by these nuisance factors (RNA extraction day and library prep day), but I’m unsure how best to assign samples across these time points to account for the batch effects (different days for RNA isolation and lib prep). I'm a bit concerned about having a singular model matrix - model matrix is not full rank.
Here is a sample metadata table that mimics the experiment design.
| sample_name | treatment | RNA_extraction_day | library_prep_day | RNA_extract_tech | lib_prep_tech |
| ----------- | --------- | ------------------ | ---------------- | ---------------- | ------------- |
| S1 | con | A | C | techA | techB |
| S2 | con | A | C | techA | techB |
| S3 | con | B | C | techA | techB |
| S4 | con | B | C | techA | techB |
| S5 | stage_1 | A | C | techA | techB |
| S6 | stage_1 | A | C | techA | techB |
| S7 | stage_1 | B | C | techA | techB |
| S8 | stage_1 | B | C | techA | techB |
| S9 | stage_1 | B | C | techA | techB |
| S10 | stage_2 | A | D | techA | techB |
| S11 | stage_2 | A | D | techA | techB |
| S12 | stage_2 | A | D | techA | techB |
| S13 | stage_2 | B | D | techA | techB |
| S14 | stage_2 | B | D | techA | techB |
| S15 | stage_2 | B | D | techA | techB |
| S16 | stage_3 | A | D | techA | techB |
| S17 | stage_3 | A | D | techA | techB |
| S18 | stage_3 | A | D | techA | techB |
| S19 | stage_3 | B | D | techA | techB |
| S20 | stage_3 | B | D | techA | techB |
I would appreciate any advice on how to better structure the experiment. These RNA extraction and library preps will be performed by a sequencing core and DNA/RNA extraction core.
Thanks, ATpoint. But I'm not sure if I understand your suggestion. Are you suggesting the following design?
In my head, the model
~ treatment + merged_days
would probably give me a full-rank model matrix :)Yes, that is what they're suggesting, as it is the simplest model for what you want (and makes the
RNA_extraction_day
andlibrary_prep_day
factors functionally moot).Yes. As jared.andrews07 says.