Entering edit mode
4.2 years ago
Aspire
▴
370
The sva package enables estimation of surrogate variables from the data itself, so that unwanted sources of variation can be removed.
How can one make sure that he does not remove too much variation? What are general guidelines?
I've been wondering about this ever since I stumbled over SVA (and RUV). Anyone with some best practice comments? At least svaseq has an antomated and by this reproducible way of finding "significant" surrogate variables whereas in RUVseq it's completely up to the user to decide for a
k
value that determines the extend of the correction. Anything other than looking at PCA plots?In a totally arbitrary manner you could run
num.sv
with both methods in the function ("be"
,"leek"
) and use the lowest number of estimated SVs, maybe?. But yes, especially with high number of samples I've often got 10s of SVs... I've also looked at PCA plots as ATpoint says (or done correlations of SVs with principal components), but I wonder, because the SVs are computed in the residual variation, should we also explore this with our data after having removed the effect of our variable of interest?