Hello!
I am trying to draw a heatmap for the differentially expressed genes. I did it, following the tutorial: http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
select <- order(rowMeans(counts(dds,normalized=TRUE)),
decreasing=TRUE)[1:20]
> select
[1] 4531 7824 5244 5309 18901 19260 19824 21819 16048 20366 20448 19275 22579 20979 10215
[16] 22060 1652 2091 14069 17406
df <- as.data.frame(colData(dds)[,"Age"])
rownames(df) <- colnames(assay(ntd)[select,]) #additional line, recommended by other tutorials, without it heatmap outputs error
> df
colData(dds)[, "Age"]
A1 3.5_weeks
A2 3.5_weeks
A3 3.5_weeks
B1 8.5_weeks
B2 8.5_weeks
B3 8.5_weeks
C1 14_weeks
C2 14_weeks
C3 14_weeks
pheatmap(assay(ntd)[select,], cluster_rows=FALSE, show_rownames=FALSE,
cluster_cols=FALSE, annotation_col=df)`
The resulting pheatmap looks like this: https://ibb.co/WykjCkd
Why are there strange lines each 5 genes, and no blue part of the plot below?
Thank you very much in advance!
Thank you! The output:
Thanks! Yes, see, the issue is right there:
The genes represented by those bands just coincidentally have a wide range of values across your sample, so, high variance: Row 6:
Row 12:
The samples that have large values 'boost up' the value of the mean for the gene. This is of course a prime example of where the mean can be misleading when the data distribution is non-uniform.
Thanks a lot! And you told that "the data distribution is non-uniform" - is there a way to check the uniformity or to improve it? Sorry if it is too basic question
The Shapiro-Wilk test can test for the 'normality' of each gene; however, this test loses sensitivity beyond [and under] a certain sample size. You may want to consult a statistician.
For producing heatmaps like these, the data should be normalised, and is also usually transformed via log or some other transformation. Your input looks like normalised counts?