How to avoid overfitting using SVA?

2

Entering edit mode

4.2 years ago

Aspire ▴ 360

The sva package enables estimation of surrogate variables from the data itself, so that unwanted sources of variation can be removed.

How can one make sure that he does not remove too much variation? What are general guidelines?

RNA-Seq sva overfitting • 1.2k views

ADD COMMENT • link updated 2.4 years ago by ATpoint 85k • written 4.2 years ago by Aspire ▴ 360

3

Entering edit mode

I've been wondering about this ever since I stumbled over SVA (and RUV). Anyone with some best practice comments? At least svaseq has an antomated and by this reproducible way of finding "significant" surrogate variables whereas in RUVseq it's completely up to the user to decide for a k value that determines the extend of the correction. Anything other than looking at PCA plots?

ADD REPLY • link 2.4 years ago by ATpoint 85k

0

Entering edit mode

In a totally arbitrary manner you could run num.sv with both methods in the function ("be", "leek") and use the lowest number of estimated SVs, maybe?. But yes, especially with high number of samples I've often got 10s of SVs... I've also looked at PCA plots as ATpoint says (or done correlations of SVs with principal components), but I wonder, because the SVs are computed in the residual variation, should we also explore this with our data after having removed the effect of our variable of interest?

ADD REPLY • link 2.4 years ago by Papyrus ★ 3.0k

Login before adding your answer.