What is the normalization w.r.t to gene expression counts or values?
1
0
Entering edit mode
9.8 years ago
murali ▴ 110

What is the normalization w.r.t to gene expression counts or values?

What is the difference between FPKM,RPKM normalization and gene expression counts normalization?

RNA-Seq Differential-expression • 7.4k views
ADD COMMENT
1
Entering edit mode

Your first sentence doesn't quite parse. Are you asking what the normalization methods are? BTW, there's more than one way to generate FPKM/RPKM values, one of which uses normalized counts (as opposed to the original method that should never be used).

ADD REPLY
0
Entering edit mode

@Devon Ryan, Yes .I want to get overview of this normalization concept w.r.t gene expression counts.

Can you please suggest me any paper related to this topic.

ADD REPLY
0
Entering edit mode

aha, w.r.t means "with respect to"... now I get it

ADD REPLY
7
Entering edit mode
9.6 years ago
Martombo ★ 3.1k

it's really quite simple: the first quantity that was used to measure gene expression in RNA-seq was the RPKM (Reads Per Kilobase per Million of reads). This was just meant to give an idea of the reads density in a certain region. 1RPKM is the read density that you would get over a region of 1 KB if you sequenced 1 Million of reads in your experiment. This then would ideally allow you to compare this value between different samples. What it has been observed is that this measure shows a certain bias under specific situations (see slides) and therefore it's not well suited for such a comparison nor for a differential expression analysis. This concept was extended to FPKM (Fragments Per Kilobase per Million of reads) once the paired-end technique was developed. So just use RPKM with single-end reads and FPKM with paired-end reads. They are computed in the same way.

Normalized counts are instead floating point numbers that are produced by a normalization method, like the size-factor in DESeq2, which then permits a comparison of these values between different samples. In this case the bias that affects RPKM is not there. This is a bit more complicate to understand: the counts of all the genes in every samples are divided by the geometric mean of each gene across all the samples. The median value of this measure of all the genes in one sample, determines the size factor of that sample. More simply, imagine you have only two samples and in the second one you find the double of the counts for most of the genes (or the median fold-change is 2). Then with this normalization, you say: well that will probably mean that the second sample has the double of the sequencing depth of the first one, then if I want to compare the two I will have to divide the counts of the second sample by 2.

I happen to have some slides also on this topic: https://www.dropbox.com/s/yewqwpfzl0ay3ta/normalization.pdf?dl=0

Every stack of the bar plot represents the number of counts of one gene. In this example you have 1.5 more depth in condition B compared to condition A. Only one gene (red) is differentially expressed. With the RPKM normalization all the other genes will look downregulated in the second sample (which is wrong). The size-factor normalization can instead produce values which are stable for all the genes.

ADD COMMENT
0
Entering edit mode

hi Martombo.. liked the way of your description for normalization.

ADD REPLY

Login before adding your answer.

Traffic: 1444 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6