Hello everyone, hope you are doing well:)
I have a question regarding making a heatmap to visualize significance AND presense of pathways across different conditions. I already did pathway enrichment analysis using gProfiler. I imported the results and manipulated them to make the names of the pathways as rownames, adjusted p values in rows and colnames for the conditions like so:
kegg_react_wp[1:3,] # showing only first 3 pathways
Condition1 Condition2 Condition3 Condition4 Condition5 Condition6
Pathway1 0.0003 0.3225 0.0540 0.0003 0.01 0.07
Pathway2 0.03 0.003225 0.0540 0.0003 0.01 0.07
Pathway3 0.703 0.3225 0.0540 0.0003 0.01 0.07
Because some pathways have an adjusted p value >0.05, I want to make these pathways as NA , which indicates that this pathway was NOT enriched in this condition. So I used the following code to do that as well as to -log10 the pathways with adjusted p value<0.05 so that most significant pathways have larger values:
Mooi_functie= function(x){
x = ifelse(x>0.05,NA,-log10(x))
return(x)
}
Now apply the function to my data
pathways_clean<-kegg_react_wp%>%
mutate_if(is.numeric,Mooi_functie) # this will make cells NA if their adjusted p value larger than 0.05
Now making the heatmap(using pheatmap or complexheatmap):
pheatmap(pathways_clean) # pheatmap package
Heatmap(pathways_clean)# complexheatmap package
I get the following error
Error in hclust(d, method = method) :
NA/NaN/Inf in foreign function call (arg 10) # from pheatmap
Error in hclust(get_dist(submat, distance), method = method) :
NA/NaN/Inf in foreign function call (arg 10) # from complexheatmap
A similar question has been posted here about the same error, but the purpose for that question, was different than mine. I want to KEEP NAs(unlike the already posted question) so that I can color them differently to indicate that these pathways are NOT present in this condition.
it seems that this error has to do with the clustering of the rows, so if I set cluster_rows=F
, it works, but I want to cluster the rows and scale by row to see in which condition the significance is larger. I understand that there are some rows which are basiaclly all NAs except for one or two cells and this seems problematic for making the heatmaps.
I found in the internet a nice trick to add a column with any value , and it worked
pathways_clean$fake_column<-1
However, now It worked, but I am left with a fake column. it's not nice to have this column in the heatmap as you can see below.
the code I used to generate the heatmap:
pheatmap(pathways_clean,cluster_rows = T,na_col = "white",border_color = "white",
annotation_row =meta_kegg_wp_reac,cellwidth = 35,fontsize = 8,angle_col = 45,
scale = "row")
My questions are:
1) How to solve the issue of clustering with NAs without making a fake column? Or perhaps there is another way to visualize this without using NAs?(I tried replacing the NAs with zeros, but then the color scaling gets messy and cannot tell heads from tails).
2) Does it make sense to scale( scale="row"
) the -log10 of adjusted p values? Because I find it difficult to make sense of z-scores of -log10 of adjusted p values, which is what the legend in the heatmap represents.
3) Do you think that there might be a better way to visualize these results in my case?
Thank you very much in advance for your help!
Was able to reach my goal using your approach.Great suggestions! Thank you very much