Hi, I'm currently struggling with an RNA-Seq experiment, especially with the batch-effects which potentially effects my analysis. I want to use svaseq() from the sva-package like recommended here (chapter "Removing hidden batch effects"), to find and account for hidden surrogate variables.
Because I do not see a clear batch-effect clustering with my raw-data I thought to estimate the batch-effects with num.sv() and then use this result with svaseq(). Interestingly, num.sv() gives me 10 surrogate variables which confused me a bit. Additionally svaseq() runs into an error when using 10 surrogate variables. Because of that I wanted to define the n.sv-argument by hand using the number of my assumed batch-effects which is 3.
My question is now, what should I put into the n.sv-argument within svaseq()? Is it simply the 3? In the above-mentioned manual they write:
As we described above, we are trying to recover any hidden batch effects, supposing that we do not know the cell line information... Finally we specify that we want to estimate 2 surrogate variables.
Here is what they define for svaseq:
svseq <- svaseq(dat, mod, mod0, n.sv=2)
So they want to add the cell line as possible surrogate variable but then define n.sv with two possible surrogate variables. Why? Are they assuming, beside the cell line, another batch-effect? At the end, they add these two variables to the DESeq2-design which seems to represent the cell line effect. Maybe I missed it. However, it is not described very clearly.
Thanks for all your help in advance.