filtering lowly expressed reads
1
1
Entering edit mode
10 months ago

How do I ensure that filtration of lowly expressed counts in RNA seq data was done correctly? My goal is to find DEGs using limma voom

cpm <- cpm(dge)
lcpm <- cpm(dge, log=TRUE)
L <- mean(dge$samples$lib.size) * 1e-6
M <- median(dge$samples$lib.size) * 1e-6
c(L, M)
lcpm.cutoff <- log2(10/M + 2/L)
library(RColorBrewer)
nsamples <- ncol(dge$counts)
col <- brewer.spectral(nsamples)
par(mfrow=c(1,2))
plot(density(lcpm[,1]), col=col[1], lwd=2, ylim=c(0,0.55), las=2, main="", xlab="")
title(main="A. Raw data", xlab="Log-cpm")
abline(v=lcpm.cutoff, lty=3)
for (i in 2:nsamples){
    den <- density(lcpm[,i])
    lines(den$x, den$y, col=col[i], lwd=2)
}


##FilterbyExpr
keep.exprs <- filterByExpr(dge, group = dge$samples$group)
dge <- dge[keep.exprs,, keep.lib.sizes=FALSE]
dim(dge)

lcpm <- cpm(dge, log=TRUE)
plot(density(lcpm[,1]), col=col[1], lwd=2, ylim=c(0,0.5), las=2, main="", xlab="")
title(main="B. Filtered data", xlab="Log-cpm")
abline(v=lcpm.cutoff, lty=3)
for (i in 2:nsamples){
    den <- density(lcpm[,i])
    lines(den$x, den$y, col=col[i], lwd=2)
}

enter image description here

RNA-Seq Differential-Expression limma • 660 views
ADD COMMENT
0
Entering edit mode

Cross-posted to Bioconductor https://support.bioconductor.org/p/9155986/

ADD REPLY
0
Entering edit mode
10 months ago
CTLong ▴ 120

After filtering, you can plot your Voom transformed expression to see if there is any excess variance to the data, as shown by the authors vignette. You can change the min.count and min.prop parameters to filter more genes or less.

Briefly, you are aiming for something like the image below, so that most lowly expressed genes are removed and the square root of the residual standard deviation is small.here

ADD COMMENT

Login before adding your answer.

Traffic: 1613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6