For DGE using RNA-Seq, what is an acceptable correlation (read counts) between samples? Is this across all samples or within treatments? What is the correlation below which samples must be discarded?
For DGE using RNA-Seq, what is an acceptable correlation (read counts) between samples? Is this across all samples or within treatments? What is the correlation below which samples must be discarded?
Hi, your question is vague and it would help to understand the context in which you wish to perform a correlation analysis.
If you're referring to correlation as part of sample QC, etc., then these issues are dealt with during the normalisaton process. In this regard, other parameters to consider include dispersion and coefficient of variation.
If you're referring to just testing whether or not one transcript is correlated to another between, for example, 2 treatment groups, then run cor.test()
(in R), which will derive a P value from the correlation test.
Further information: If you've processed all of your samples in exactly the same way, then I would expect good correlation (upward of 0.95 with a highly statistically significant P value) between samples and using all transcripts in the transcriptome, irrespective of case-control or treatment status and irrespective, also, of whether it's raw or normalised counts. For raw, you will see slightly lower correlation.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Very comprehensive response for such a vague question! +1
Hey Andrew - good to see you again!
Also depends on whether your samples are technical (usually higher correlation) or biological replicates.