Question

Dealing with outliers for differential expression in proteomics

0

Entering edit mode

7 months ago

Jorge HB • 0

I have 15 samples, 3 replicates per condition, with intensity values for ~ 9500 proteins. Samples have some missingness (worst case 15%) and imputation has been performed prior to the differential expression analysis.

Taking a look to my imputed data in a pca plot, I consider there are some outliers that may bias my DEA results:

After looking to my DEA results, I find that the contrasts with the red group may be a bit inflated. For example, comparing red group with green group, I get ~ 1500 significant proteins. The people in charge of the project would prefer not to eliminate any replicates due to the small sample size for each condition.

Is it a valid approach if i run DE analysis with all samples, then another one but removing those two outliers, and keep as significant those proteins that overlap for the analysis?

Thanks in advance for any suggestion.

proteomics PCA DEA outliers limma • 626 views

ADD COMMENT • link updated 7 months ago by ATpoint 88k • written 7 months ago by Jorge HB • 0

1

Entering edit mode

There is a couple of options other than removing them:

1) Use sample weights, for example arrayWeights() in limma (see its manual) to downweight outliers in a data-driven fashion.

2) Include the replicate information into the design, basically treating each replicate as a batch.

3) Use something like the sva package to estimate surrogate variables which capture unwanted variation, and then include these into the design.

There seems to be a clear condition difference, so I would start with 1) since it is easy and quick to do, and then see what comes out.

ADD REPLY • link 7 months ago by ATpoint 88k

0

Entering edit mode

To determine if imputation might induce this effect, did you perform a PCA on common proteins before imputation look like ? You can easily achieve it with limma::plotMDS().

ADD REPLY • link 7 months ago by SamGG ▴ 150