RNA-seq normalization and differential expression
4
0
Entering edit mode
9.2 years ago
zizigolu ★ 4.3k

Sorry friends,

I got totally confused, as I understood in a common RNA-seq analysis, for example with tophat, after producing accepted_hits.bam, reads are counted by featurecount or another tools to output a read count matrix. Hereafter if the normalization (for example log transforming) is the second step or producing the RPKM or VST by some tools such as DESe2 is the same differential expression analysis and normalization together? I mean is normalization an independent step before differential expression analysis?

Thank you

RNA-Seq • 4.6k views
ADD COMMENT
2
Entering edit mode
9.2 years ago
jnf3769 ▴ 40

I'd say normalization is part of differential expression analysis. Depending on the biological context of your data and how you performed the sequencing, you may need to do additional normalization or quality control. Arguably, quality control could be considered part of it too or as a preprocessing step.

ADD COMMENT
1
Entering edit mode

You pointed to quality control as pre-processing step. In my work, the initial quality control checking by drawing box plot or MA plot showed that the data are not normalized. In this situation, could you please let me know if I have to do normalization before doing differential expression analysis using say edgeR tool or it should do just on the results of differentially expressed genes to compare samples?

ADD REPLY
2
Entering edit mode
9.2 years ago

Hi!

Most tools for differential expression analysis handle normalization by themself. In the case of DESeq2 you can see from the documentation that only raw read counts must be used :

The count values must be raw counts of sequencing reads. This is important forDESeq2's statistical model to hold, as only the actual counts allow assessing the measurement precision correctly. Hence, please do not supply other quantities, such as (rounded) normalized counts, or counts of covered base pairs this will only lead to nonsensical results.

PS : be wary that DESeq2 doesn't output RPKM values. RPKM values shouldn't be used in differential analysis anyway.

ADD COMMENT
0
Entering edit mode

thank you so much

ADD REPLY
1
Entering edit mode
9.2 years ago

differential expression analysis can include the entire pipeline. Its general term. Normalisation is performed before trying to find differentially expressed genes.

ADD COMMENT
0
Entering edit mode

Thank you, for example for my understanding, is log transformation before using DESeq2?

ADD REPLY
1
Entering edit mode
If you mean normalization by log transformation, DESeq2 won't work with normalized values.
ADD REPLY
1
Entering edit mode
9.2 years ago

DESeq and EdgeR do not perform differential expression directly on normalized values. They calculate a library size factor and use that factor in their downstream differential expression tests.

ADD COMMENT

Login before adding your answer.

Traffic: 1921 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6