Annotating Proteins in heatmap with 1 or more pathways using R.
1
0
Entering edit mode
14 months ago
adam08 ▴ 10

Hello,

I relatively new to programming with R and I was wondering if someone could help me in providing suggestions on annotating proteins in a heatmap. Here I have tried the following code, but the heatmap generated is not color coded correctly. I am also unsure how to add multiple annotations to a single protein. E.g. if I wanted to annotate Protein 4 with all 3 paths or with 2 specific paths how could I do this? Any help would be greatly appreciated.

# Read the CSV file
cultureconditions <- read.csv("Sample proteomic data3.csv", row.names = 1)

# These are the names of the columns 
new_column_names <- c("condition1", "condition2", "condition3", "condition4")  
colnames(cultureconditions) <- new_column_names

#Generate annotation for rows
annotation_row = data.frame(
    GenePaths = factor(rep(c("Path1", "Path2", "Path3"), c(17, 5, 18)))
)

#Color code annotations
ann_colors = list(GenePaths = c(Path1 = "#7570B3", Path2 = "#E7298A", Path3 = "#66A61E")
)

# Create the heatmap
# scale=row allows for normalization with z scores 
#cluster_cols prevents the columns from clustering 

heatmap <- pheatmap(cultureconditions, scale = "row", cluster_row = FALSE, cluster_cols = FALSE, annotation_row = annotation_row, 
    annotation_colors = ann_colors, angle_col = 45, fontsize = 16)

This is the result I get:

enter image description here

R Heatmap pheatmap • 1.8k views
ADD COMMENT
1
Entering edit mode

When I run your code with a dummy dataset, it produces a figure as expected with Genepaths colored. There's something with your dataset. Could you post the first 20 rows of your dataset? An easy way to do this is to deput like this: dput(cultureconditions[1:20,]). If you can not post your dataset, then create a dummy one (see pheatmap docs for an example)

ADD REPLY
1
Entering edit mode

Thank you for your response! I have attached the code results from the code below.

> dput(cultureconditions[1:20,])

structure(list(condition1 = c(153.6286609, 8495.577956, 1283.506874, 
5376.004328, 1345, 1263.966509, 780.2137199, 3096.534091, 5710.099775, 
437.9382486, 288.10209, 755.1863589, 1160.239461, 241.3590358, 
17675.16009, 9146.903437, 89.29736361, 700, 563.0639983, 4038.300533
), condition2 = c(130, 7304.219269, 1192.080765, 4922.022244, 
2235.794576, 1203.40006, 1291.896364, 2880.33954, 5002.754073, 
352.769094, 200, 500, 1108.729595, 270.5404985, 19282.83952, 
8766.93408, 121.0509699, 900, 588.2291337, 4067.275171), condition3 = c(563.7867629, 
29974.60495, 2494.897555, 8144.586067, 16165.33354, 4376.277767, 
2110.413052, 6043.137715, 8000, 1200, 800, 1206.256272, 2688.307957, 
811.9269546, 28301.27059, 14193.071, 221.0048808, 4931.831031, 
1243.234106, 8258.032841), condition4 = c(700, 30000, 2100, 7006.0668, 
5000, 2658.375157, 1185.035407, 2855.958155, 3958.803261, 768.209857, 
700, 856.2116643, 881.3946047, 1134.283766, 26172.3203, 14865.08983, 
197.3119048, 3000, 1234, 5624.147258)), row.names = c("Protein 1", 
"Protein 2", "Protein 3", "Protein 4", "Protein 5", "Protein 6", 
"Protein 7", "Protein 8", "Protein 9", "Protein 10", "Protein 11", 
"Protein 12", "Protein 13", "Protein 14", "Protein 15", "Protein 16", 
"Protein 17", "Protein 18", "Protein 19", "Protein 20"), class = "data.frame")

Appreciate your time in effort in helping me with this!

ADD REPLY
1
Entering edit mode

Thanks for posting the data. See my answer.

ADD REPLY
2
Entering edit mode
14 months ago
Mark ★ 1.6k

Your annotation_row dataframe needs to have the rows named. This is how pheatmap matches which rows to annotate. Note your row names in your original dataframe need to match exactly the row names in the annotation_row. So Protein1 and Protein 1 are different.

annotation_row = data.frame(
    GeneClass = factor(rep(c("Path1", "Path2", "Path3"), c(10, 4, 6)))
)
rownames(annotation_row) = paste("Protein ", 1:20, sep = "")

I also noticed your annotation_row has the wrong dimensions. It should be 20 rows like your first dataframe.

To improve your code I suggest these edits:

Instead of:

new_column_names <- c("condition1", "condition2", "condition3", "condition4")  
colnames(cultureconditions) <- new_column_names

Just do:

colnames(cultureconditions) <- c("condition1", "condition2", "condition3", "condition4")
ADD COMMENT
0
Entering edit mode

Thank you it worked out very nicely! Appreciate all your help. Would you happen to know how I could add multiple annotations to a single protein? e.g I would add to Protein 4 an annotation for all 3 paths or for another protein an annotation of 2 paths?

enter image description here

ADD REPLY
1
Entering edit mode

Add another item to annotation_row and ann_colors eg annotation_row = data.frame( GenePaths = factor(rep(c("Path1", "Path2", "Path3"), c(17, 5, 18))), ProteinPaths = factor(rep(c("Path1", "Path2", "Path3"), c(17, 5, 18))) )

See the docs they have an example of this.

ADD REPLY
0
Entering edit mode

Thank you very much again! I will give it a try. Also, my apologies but could clarify which pheatmap doc you are referring to? This has been my reference for pheatmaps currently: https://jokergoo.github.io/ComplexHeatmap-reference/book/integrate-with-other-packages.html

ADD REPLY
1
Entering edit mode

The reference manual has a huge tutorial. See page 5 https://cran.r-project.org/web/packages/pheatmap/pheatmap.pdf

ADD REPLY
0
Entering edit mode

Thank you very much! I did look through this doc. I think the trouble that I am having is that I would like to use the same data frame (GenePaths) to annotate multiple proteins. In other words, I would like to annotate select proteins to suggest that they are a part of Path1 and Path2 etc. Would you happen to know if I can do this with the same data frame?

ADD REPLY
1
Entering edit mode

There's two ways to do this. I would recommend the what I suggested above about creating another item in the dataframe. Another way is to add another factor to your items to indicate shared paths. eg: factor(rep(c("Path1", "Path2", "Path3", "Path1.Path3"), c(17, 5, 18, 3))), here "Path1.Path3" indicate a shared path.

Don't forget to create a color for the extra items you create.

ADD REPLY
0
Entering edit mode

Thank you very much! Apologies for the repeated questions. I tried the second option that you suggested but it gave the following error shown below.

Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

Here is my code below:

  colnames(cultureconditions) <- c("condition1", "condition2", "condition3", "condition4")

#Generate annotation for rows
annotation_row = data.frame(
    GenePaths = factor(rep(c("Path1", "Path2", "Path3","Path1.Path3"), c(17, 5, 18,3)))
)
rownames(annotation_row) = paste("Protein ", 1:40, sep = "")

#Color code annotations
ann_colors = list(GenePaths = c(Path1 = "#7570B3", Path2 = "#E7298A", Path3 = "#66A61E", Path1.Path2 = "#38B41E")
)

# Create the heatmap

heatmap <- pheatmap(cultureconditions, scale = "row", cluster_row = FALSE, cluster_cols = FALSE, annotation_row = annotation_row, 
    annotation_colors = ann_colors, angle_col = 45, fontsize = 16)  

It seems that my rows for my "culturecondition" and annotation.rows are not matching as it provides me with this:

> dim(cultureconditions)
[1] 40  4
> dim(annotation_row)
[1] 43  1

I am not sure how to provide these additional row names given that it is the same row that I am trying annotate a second time.

ADD REPLY
0
Entering edit mode

annotation_row has 43 rows but you are trying to rename 40. Either increase this 1:40 to 1:43 or fix the number of rows you have.

You already knew the problem, the number of items/rows do not match. Make them match.

ADD REPLY

Login before adding your answer.

Traffic: 1561 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6