How to avoid overfitting using SVA?
0
2
Entering edit mode
4.2 years ago
Aspire ▴ 360

The sva package enables estimation of surrogate variables from the data itself, so that unwanted sources of variation can be removed.

How can one make sure that he does not remove too much variation? What are general guidelines?

RNA-Seq sva overfitting • 1.2k views
ADD COMMENT
3
Entering edit mode

I've been wondering about this ever since I stumbled over SVA (and RUV). Anyone with some best practice comments? At least svaseq has an antomated and by this reproducible way of finding "significant" surrogate variables whereas in RUVseq it's completely up to the user to decide for a k value that determines the extend of the correction. Anything other than looking at PCA plots?

ADD REPLY
0
Entering edit mode

In a totally arbitrary manner you could run num.sv with both methods in the function ("be", "leek") and use the lowest number of estimated SVs, maybe?. But yes, especially with high number of samples I've often got 10s of SVs... I've also looked at PCA plots as ATpoint says (or done correlations of SVs with principal components), but I wonder, because the SVs are computed in the residual variation, should we also explore this with our data after having removed the effect of our variable of interest?

ADD REPLY

Login before adding your answer.

Traffic: 2809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6