how to normalize row counts before drawing heatmap?
1
1
Entering edit mode
5.5 years ago
smyiz ▴ 30

I have raw counts and edgeR differential expression results and want to draw a heatmap with logFC value. I have 12 groups, two cell lines with triplicate total and IP.

(cell1T1, cell1T2, cell1T3, cell1IP1, cell1IP2, cell1IP3, 
 cell2T1, cell2T2, cell2T3, cell2IP1, cell2IP2, cell2IP3)

I want to normalize the count data by calculating scaling factor, cpm and fold change (Ip/total). My R script:

 cs = colSums(count)
scale_factor <-  1e6 / colSums(count)
scale_factor
data = t( t(count)/cs) * 1e6                               
cs2 = colSums(data)
cs2

> cs = colSums(count)
cell1T1     cell1T2    cell1T3      cell1IP1     cell1IP2     cell1IP3 
9061105     6832076    1472003      12019856     5921757      2835648 
cell2T1     cell2T2    cell2T3      cell2IP1     cell2IP2     cell2IP3 
4696948     4387729    3907566      7580533      14312254     19052159

> scale_factor <-  1e6 / colSums(count)
> scale_factor

cell1T1     cell1T2    cell1T3      cell1IP1     cell1IP2     cell1IP3 
0.11036182  0.14636840 0.67934644   0.08319567   0.16886880   0.35265308 
cell2T1     cell2T2    cell2T3      cell2IP1     cell2IP2     cell2IP3
0.21290421  0.22790833 0.25591378   0.13191685   0.06987020   0.05248749

> data = t( t(count)/cs) * 1e6
> cs2 = colSums(data)
> cs2 

cell1T1     cell1T2    cell1T3      cell1IP1     cell1IP2     cell1IP3 
1e+06       1e+06      1e+06        1e+06        1e+06        1e+06 
cell2T1     cell2T2    cell2T3      cell2IP1     cell2IP2     cell2IP3
1e+06       1e+06      1e+06        1e+06        1e+06        1e+06

All columns sum to 1e6 (1 million). Does it show cpm value? After that how can I find fold changes between IP and totals?

R RNA-Seq heatmap • 3.4k views
ADD COMMENT
2
Entering edit mode

Hi,

You can apply z-score standardization on edgeR normalized counts.

You may apply R script to transpose data and perform scale function to calculate z-score gene-wise, later re-transpose data as follows:

z_edgeRnormcounts = t(scale(t(edgeRnormcounts), center = TRUE, scale = TRUE))

These z-score you can use to plot heatmap for your gene of interest.

ADD REPLY
1
Entering edit mode
5.5 years ago
h.mon 35k

Usually the packages used to analyse differential expression separate exploratory analyses (such as clustering, PCA, heatmaps, etc) from the actual differential expression testing.

edgeR provides the cpm( ) function, which produces moderated log2-counts-per-million from the raw counts. If you pass a DGEList object to cpm( ), it will use the normalized library sizes in the calculations, if you pass a matrix (and set cpm( count, log = FALSE ), then I think the result will be the same as yours above. You can probably use the cpmByGroup( ) function to calculate fold-changes, but this is not the preferred method.

In edgeR, the differential expression testing - including fold-change estimation - is performed on untransformed counts. There are several methodologies for DE modeling and testing in edgeR (such as glmQLFit() / glmQLFTest( ), glmFit( ) / glmLRT( ), and others), then one extracts the fold-changes from these results.

ADD COMMENT

Login before adding your answer.

Traffic: 2341 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6