I have raw counts and edgeR differential expression results and want to draw a heatmap with logFC value. I have 12 groups, two cell lines with triplicate total and IP.
(cell1T1, cell1T2, cell1T3, cell1IP1, cell1IP2, cell1IP3,
cell2T1, cell2T2, cell2T3, cell2IP1, cell2IP2, cell2IP3)
I want to normalize the count data by calculating scaling factor, cpm and fold change (Ip/total). My R script:
cs = colSums(count)
scale_factor <- 1e6 / colSums(count)
scale_factor
data = t( t(count)/cs) * 1e6
cs2 = colSums(data)
cs2
> cs = colSums(count)
cell1T1 cell1T2 cell1T3 cell1IP1 cell1IP2 cell1IP3
9061105 6832076 1472003 12019856 5921757 2835648
cell2T1 cell2T2 cell2T3 cell2IP1 cell2IP2 cell2IP3
4696948 4387729 3907566 7580533 14312254 19052159
> scale_factor <- 1e6 / colSums(count)
> scale_factor
cell1T1 cell1T2 cell1T3 cell1IP1 cell1IP2 cell1IP3
0.11036182 0.14636840 0.67934644 0.08319567 0.16886880 0.35265308
cell2T1 cell2T2 cell2T3 cell2IP1 cell2IP2 cell2IP3
0.21290421 0.22790833 0.25591378 0.13191685 0.06987020 0.05248749
> data = t( t(count)/cs) * 1e6
> cs2 = colSums(data)
> cs2
cell1T1 cell1T2 cell1T3 cell1IP1 cell1IP2 cell1IP3
1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
cell2T1 cell2T2 cell2T3 cell2IP1 cell2IP2 cell2IP3
1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
All columns sum to 1e6 (1 million). Does it show cpm value? After that how can I find fold changes between IP and totals?
Hi,
You can apply z-score standardization on edgeR normalized counts.
You may apply R script to transpose data and perform scale function to calculate z-score gene-wise, later re-transpose data as follows:
These z-score you can use to plot heatmap for your gene of interest.