Entering edit mode
3 months ago
aUser
▴
70
Hi everyone,
Can we treat outliers as batch variables in linear modeling, e,g in DESeq2? I know the batches are different, however, can I think that the "samples in outliers are differently processed" thus qualifying to be a different batch? I do not want to remove the outliers (based on PCA, PC1 > 200; actual value is around 600 along PC1. There are ~20 samples). I want to include them for DEG calculation. I was looking for the resources where this has been discussed, but mya be I missed.
Thank you for your input/comment.
Can you provide more context, and also show your PCA? Were there replicates of each sample?
I'm not sure I follow your logic. Outliers in linear models are individual samples that deviate from expected distributions. If the outliers were processed in a different manner or came from the same day of sampling, for example, then there could be a technical batch effect.
Thank you for your response, and sorry for being late as we had vacations here.
I am working with TCGA-LUAD data set, and the samples are processed/normalized using DESeq2. The steps are given below:
For PCA:
The samples >200 along PC1 are considered as outliers (as suggested by literature).
The PCA figure is attached. NT are normals, while TP are tumor samples.