EdgeR normalization doesn't change gene count number
1
0
Entering edit mode
4.0 years ago
miller93 • 0

Hi everyone,

I am new to RNA-seq and have I recently performed RNA-seq (Illumina NovaSeq 6000; 25M reads; PE, 100bp; Mus_musculus:mm10). I am currently performing a DE on my samples using EdgeR and noticed that my read counts do not change after normalization. Based on my understanding of how EdgeR normalization works, the read counts should change. Should I use a different normalization approach? Can someone help determine what I am doing incorrectly? Below I've included the R commands I've used to perform the analysis and the output.

COMMAND LINES IN R:

head(fc_res)

row.names(fc_res) <- fc_res$Gene

fc_res <- fc_res[-c(1,20)]

group <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6))

y <- DGEList(counts=fc_res, group=group)

y$samples

keep <- filterByExpr(y, design)

filt1 <- y[keep,,keep.lib.sizes=FALSE]

dim(filt1)

ynorm <- calcNormFactors(filt1)

ynorm$samples

plotMDS(ynorm)

OUTPUT:

         group lib.size norm.factors
S1_REP1             1 21780263    0.9964533

S1_REP2             1 19985191    0.9930817

S1_REP3             1 22149855    0.9885835

S2_REP1                2 23139231    0.9778646

S2_REP2                2 23631094    0.9808015

S3_REP3                2 25284278    0.9769698

S3_REP1     3 20292937    0.9350821

S3_REP2     3 25694663    0.9335373

S3_REP3     3 20825373    0.9310621

S4_REP1        4 23995722    0.9514329

S4_REP2        4 23429701    0.9364913

S4_REP3        4 22528422    0.9643743

S5_REP1     5 27466471    1.1070687

S5_REP2     5 29807414    1.0291615

S5_REP3     5 20450417    1.0667385

S6_REP1        6 24783768    1.0259821

S6_REP2        6 21733797    1.1276330

S6_REP3        6 18539629    1.1109447

Thank you, David

RNA-Seq R next-gen • 788 views
ADD COMMENT
0
Entering edit mode
4.0 years ago
brianj.park ▴ 60

You are not using discrete read counts for differential expression. There is a function called cpm that transforms your data into continuous log scale and you use these values for downstream differential expression. If you run lcpm <- cpm(y, log=TRUE) and lcpm <- cpm(ynorm, log=TRUE) you will see that the result reflects differences after your normalization.

Resource for edgeR here: https://www.bioconductor.org/packages/devel/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html, specifically section 5.1 for lcpm calculation and 5.3 for normalization.

ADD COMMENT

Login before adding your answer.

Traffic: 2438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6