Entering edit mode
4.1 years ago
Aspire
▴
370
Is there a monotonous relationship between the number of SVs one estimates (sva package), and the number of DE genes one should get as significant? In other words, is it true that the more SVs are used, the more DE genes should be significant?
(I understand that a large number of significantly DE genes is not a positive value in itself; and the more SVs one estimates, the more likely the chance to overfit. Just for understanding's sake )
Could you rephrase your question (and maybe give an example), it's not clear. Are you asking if there is a linear relation between SV number and DE genes number ?
Clarified the question.
No, why should there be?
From SVA manual :
Imho, singificantly DE genes are exactly genes that are consistently different between the groups. So, it seems that the general definition of the goal of SVA implies the definition of significantly DE genes.
Also, when I adjust for SVs using limma's removeBatchEffect, and plot the PCA/distance heatmap, the more SVs are included the better the separation between the groups. Even though the statistical method to determine DE genes (via DESeq2) is different from PCA separation, often it is the case that when the separation on the PCA is good, there are many significantly DE genes.