I’m working with some data with large BCV (indicated below).
I made my design matrix accounting for all the covariates I possibly could with the metadata I had. I applied conservative gene expression thresholding (eg >= 0.75 log(CPM + 1) measured in >= k% of samples as described in the voom guides and various biostars/bioconductor posts)
I compared the mean-variance trends to Fig 1 of the Law et al 2014 voom paper. Both curves had faint upward kinks (in my eyes) around (3,1) when comparing to Law et al Fig 1 and I read mixed reviews on various posts so I wanted to post here. This kink does not seem like the ones people have described with under filtering.
Question: What is the cause of these kinks? Wondering if I should filter out more genes or if this kink around (3, 1) is not due to dropout? Is it OK to have? Is it suggestive of another issue that should be addressed?
Comparison 1: whole blood vs. PBMC from sick and healthy patients - I felt it was the closest to 1C in Law et al (BCV = 0.612)
Comparison 2: saliva (healthy only) vs whole blood (sick/healthy) vs PBMC (sick/healthy) (BCV = 0.636) - I felt this was closest to 1E in Law et al.
I made the decision to run separate DE since I'm not comparing the genes in comparison 1 to comparison 2 and because there's some covariates that do not apply to the third and I can't have nan in the design matrix.