Entering edit mode
20 months ago
star
▴
350
I have an RNA-seq count table which is generated by integrating several studies. I want to calculate RPKM but first I run RUVr to remove the unwanted variables. I want to know if I use normCounts(RUVr)
to calculate RPKM, would it be correct?
RUVr
should be run on raw data insofar I am aware. The "corrected" counts it produces should not be used for downstream analyses (this is for visualization purposes only). What you need to do is take the weights for the unknown sources of variation produced byRUVr
and supply these alongside your design matrix to downstream tools such asDESeq2
(again, with the raw counts as the primary inputs here).Thank you @Dunois, but how we can compare Genes across samples that are selected from different studies? when there is the batch effect.
If you are using DESeq2 for example, you should include the batch as a variable in the model. This is covered fairly extensively in the vignette:
http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
That requires that the tested covariate(s) are present in all batches though. Meaning that if you e.g. have five studies and want to test control vs treatment, then each study needs controls and treatments. You cannot collect treatment from study 1-3 and control from study 4-5. That would be so-called "confounded" or "nested". Is that the case @star?
Thank you @ATpoint.
I have 3 samples from Study 1 and 5 samples from Study 2 that two of which are the sample from Study 1 (Study 1= samples A, B, C and Study 2 = samples A, B, D, E, F ). I don`t want to run Differential Analysis, but only to calculate TPM/RPKM from the count table (because I would like to check the expression of a subset of genes across those samples as a heatmap).
Since I expect e.g. sample A and B cluster together (in PCA plot), I run RUVr and when I plot PCA on
normCounts(method3_RUVr)
, it looks nice but I am not sure whether I can usenormCounts(method3_RUVr)
for calculating TPM?