The short answer is no. For the longer answer, you'll need to go with the age old rule of thumb, keep all experimental conditions the same, except the variable that's relevant to your hypothesis. In the realm of sequencing, and high throughput experimentation, technical variability plays a huge role.
Three things to consider when designing a sequencing / high throughput experiment: batches, platform, and sample type placement.
To absorb a given nuisance effect, there needs to be balance on your primary term of interest. Take the following design as an example :
> pheno
Sample ID SampleType
SAM1 A
SAM2 A
SAM3 A
SAM4 B
SAM5 B
SAM6 B
These are all from the same sequencer, no nuisance variables that need to be accounted for, so the design matrix is simply design.matrix(~0 + SampleType, data = pheno)
. Now consider a known batch effect such as this:
> pheno.batch
Sample ID SampleType Batch
SAM1 A B1
SAM2 A B1
SAM3 A B2
SAM4 A B2
SAM5 B B1
SAM6 B B1
SAM7 B B2
SAM8 B B2
We'd consider this experiment to be well balanced, as you can see there are samples with SampleType
A
and B
in batch 1 and 2. This balance means that variation can be estimated across SampleType
and Batch
, with the following design matrix: design.matrix(~0 + SampleType + Batch, data = pheno.batch)
.
Next, here's an example of an unbalanced design, where the design matrix will not be full rank
, which means that there's an unbalanced term(s) in the design matrix. In this case, we can see that our SampleType
column, and Batch
columns are the same when looking at factor levels.
> pheno.batch2
Sample ID SampleType Batch
SAM1 A B1
SAM2 A B1
SAM3 A B1
SAM4 A B1
SAM5 B B2
SAM6 B B2
SAM7 B B2
SAM8 B B2
When running design.matrix(~0 + SampleType + Batch, data = pheno.batch2)
, there will be a full rank
error. Now that we understand the difference between a balanced and unbalanced design, lets take your OP as an example:
> pheno.batch.op
Sample ID SampleType Platform
SAM1 Healthy_Control HT12v4
SAM2 Healthy_Control HT12v4
SAM3 Healthy_Control HT12v4
SAM4 Healthy_Control HT12v4
SAM5 Disease HT12v3
SAM6 Disease HT12v3
SAM7 Disease HT12v3
SAM8 Disease HT12v3
In this case, Platform
is essentially, the same as SampleType
, so we can't untie that variation. What that essentially means at an abstract level, is that we can't statistically tell the difference between variation that is coming from the difference of platform, or biological variance coming from the sample types. These experiments are highly sensitive to change, so that's not even accounting for library preps, sample preparations, source biological material, kits used, temperature in the room, person who prepped the sample, etc.
Dear andrew.j.skelton73,
I really appreciated your comment, it was very enlightening,
My sincere thank you. Leite
Hi Andrew,
I have a similar case. I have a two factorial design (genotypes and treatment effects). My control samples are from Novaseq and treated samples from Hiseq. Is there a way to deal this situation? According to your previous comments, there is no way to deal the unbalanced design.
Thanks