Hello all,
I would like to create a design for differential gene expression analysis along the lines of
model.matrix(~0 + group + sex + batch)
but one where 'batch' is a nested variable that takes into account the library preparation batch, and the fact that some samples were re-run and the data combined to generate an adequate number of counts, AND the particular flow cells the one (or two) runs of each sample were performed on.
I have not used such a nested design before and though there is some good material I have found on the subject I am still unsure exactly how to encode such a complex nesting variable. I would very much appreciate any advice or examples that more experienced readers could provide.
Many thanks in advance!
If you ran the exact same library more than once, just combine the counts. It's not a significant source of technical artifacts. Neither is flowcell.