Hi everyone,
I want to correlate the surrogate variables I get from sva (https://www.bioconductor.org/packages/3.3/bioc/html/sva.html) to qc metrics, in particular with gene body coverage.
I got the idea from this paper: http://www.nature.com/nbt/journal/v32/n9/full/nbt.3000.html : "Finally, we observed that the latent experimental factors determined by PEER and sva are highly correlated with QC metrics and properties, and that these factors were responsible for the majority of false positives in inter-site DEG analysis. For sva, the first latent factor was significantly correlated with the GC content distribution quality metric of the sites (P < 2 × 10−7), the average error rate (P < 6 × 10−7) and the duplication by library (see Supplementary Fig. 19, P < 2 × 10−4)..."
Has anyone tried this before? What would be the right way to do it and to determine if biased gene body coverage increases the false positives in my particular experiment?
Thanks, Maria