Question

Strangely looking pheatmap after DEseq2

0

Entering edit mode

5.2 years ago

tanya_fiskur ▴ 80

Hello!

I am trying to draw a heatmap for the differentially expressed genes. I did it, following the tutorial: http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

select <- order(rowMeans(counts(dds,normalized=TRUE)),
            decreasing=TRUE)[1:20]

> select
 [1]  4531  7824  5244  5309 18901 19260 19824 21819 16048 20366 20448 19275 22579 20979 10215
 [16] 22060  1652  2091 14069 17406

df <- as.data.frame(colData(dds)[,"Age"])
rownames(df) <- colnames(assay(ntd)[select,]) #additional line, recommended by other tutorials, without it heatmap outputs error

> df
   colData(dds)[, "Age"]
A1             3.5_weeks
A2             3.5_weeks
A3             3.5_weeks
B1             8.5_weeks
B2             8.5_weeks
B3             8.5_weeks
C1              14_weeks
C2              14_weeks
C3              14_weeks

pheatmap(assay(ntd)[select,], cluster_rows=FALSE, show_rownames=FALSE,
     cluster_cols=FALSE, annotation_col=df)`

The resulting pheatmap looks like this: https://ibb.co/WykjCkd

if picture is not shown, press the link above

Why are there strange lines each 5 genes, and no blue part of the plot below?

Thank you very much in advance!

pheatmap deseq2 RNA-Seq • 2.4k views

ADD COMMENT • link updated 5.2 years ago by Kevin Blighe 89k • written 5.2 years ago by tanya_fiskur ▴ 80

score 1 · Accepted Answer · 2020-04-26

1

Entering edit mode

5.2 years ago

Kevin Blighe 89k

I am not sure what is unusual here. This code will merely create a heatmap of the top 20 genes based on normalised count. The colour scheme is probably not intuitive because the colour shade goes from blue-white-red while the values go from ~10 to ~16.

It is likely pure coincidence that the unusual signal occurs every 5 rows.

If you want, please show the output of:

counts(dds,normalized=TRUE)[select,]

Kevin

ADD COMMENT • link 5.2 years ago by Kevin Blighe 89k

0

Entering edit mode

Thank you! The output:

> counts(dds,normalized=TRUE)[select,]
                      A1        A2        A3        B1         B2        B3        C1
Nf_1_004531 49216.609 38310.001 52683.232 60835.332 61316.0741 55007.475 62430.015
Nf_1_007824 27231.960 41369.197 35939.795 58822.836 27629.2749 47563.491 36320.484
Nf_1_005244 15839.796 18705.426 13647.251 21174.409 16791.7696 20812.083 18067.050
Nf_1_005309 19570.699 21828.399 21586.428 15830.231  9687.2361 15603.902 15171.921
Nf_1_018902 18209.008 18932.305 17598.355 17048.265  8326.8446 17837.531 16240.967
Nf_1_019261  5813.051 59252.844 38181.988  3079.857  1194.6951  4517.233  3284.321
Nf_1_019825 14219.933 17404.798 14295.140  9745.328 10061.8541 10433.745 10887.524
Nf_1_021820 10555.204  9751.580  8893.912 14670.148  8350.8586 15124.801 11661.803
Nf_1_016049 11427.581 11464.144 12827.455 13113.888  6883.6049 12741.249 11527.145
Nf_1_020367  8898.992 11056.391  8312.568 14339.298  9651.2151 13113.883 11805.492
Nf_1_020449 10592.485  8897.389 10083.402 11012.884  9007.6407  9972.026 11622.391
Nf_1_019276  3618.128 46698.221 22798.100  2040.945   838.0876  2720.335  2617.604
Nf_1_022580 11936.468 10349.618 10161.962  8227.000  9115.7036  7681.904  9713.379
Nf_1_020980  7412.409  7827.820  6603.658  8503.059  8671.4451  8985.579  8667.323
Nf_1_010215  7939.937  6996.630  7011.245  8853.929  6061.1264  8524.947  8476.832
Nf_1_022061  7673.377  6754.069  6422.508  8362.922  5995.0880  8219.670  8334.785
Nf_1_001652  7379.788  6745.705  6064.829  7216.537  8796.3178  7426.601  8060.545
Nf_1_002091  2989.942 35683.651 18605.771  1227.517   726.4226  1776.257  2300.667
Nf_1_014070  6849.465  6353.635  4742.249  8820.212  5603.6602  8394.580  7902.076
Nf_1_017407  6455.218  7014.404  6366.129  7669.612  5429.5590  8082.784  7204.979
                       C2         C3
Nf_1_004531 63510.5786 53728.3908
Nf_1_007824 53981.2295 44263.6530
Nf_1_005244 13580.9372 16039.7943
Nf_1_005309 14888.9704 11032.8475
Nf_1_018902 16629.6150 12382.9879
Nf_1_019261  1323.3318  1377.6510
Nf_1_019825 10746.4404  9781.1108
Nf_1_021820 12488.7848 12114.2295
Nf_1_016049 11875.9895  9266.8724
Nf_1_020367  9356.8144  9890.0955
Nf_1_020449 12096.1198  7152.7812
Nf_1_019276  1029.2581   971.3392
Nf_1_022580  6972.7773  5665.0873
Nf_1_020980  8380.2517  7967.5210
Nf_1_010215  8540.0374  6463.9556
Nf_1_022061  8381.1016  7127.3867
Nf_1_001652  7293.1987  6927.4051
Nf_1_002091   883.9211   838.0181
Nf_1_014070  7531.1775  7267.0564
Nf_1_017407  7182.7086  6822.6528

ADD REPLY • link 5.2 years ago by tanya_fiskur ▴ 80

0

Entering edit mode

Thanks! Yes, see, the issue is right there:

The genes represented by those bands just coincidentally have a wide range of values across your sample, so, high variance: Row 6:

Nf_1_019261  5813.051 59252.844 38181.988  3079.857  1194.6951  4517.233  3284.321

Row 12:

Nf_1_019276  3618.128 46698.221 22798.100  2040.945   838.0876  2720.335  2617.604

The samples that have large values 'boost up' the value of the mean for the gene. This is of course a prime example of where the mean can be misleading when the data distribution is non-uniform.

ADD REPLY • link 5.2 years ago by Kevin Blighe 89k

0

Entering edit mode

Thanks a lot! And you told that "the data distribution is non-uniform" - is there a way to check the uniformity or to improve it? Sorry if it is too basic question

ADD REPLY • link 5.2 years ago by tanya_fiskur ▴ 80

0

Entering edit mode

The Shapiro-Wilk test can test for the 'normality' of each gene; however, this test loses sensitivity beyond [and under] a certain sample size. You may want to consult a statistician.

For producing heatmaps like these, the data should be normalised, and is also usually transformed via log or some other transformation. Your input looks like normalised counts?

ADD REPLY • link 5.2 years ago by Kevin Blighe 89k