I have a dataset of >100 different samples. Samples are from different genotypes (e.g. X, Y, Z) and 4 different time points (T0,1,2,3) with 3 biological replicates (R1,2,3). I'm measuring values for 50 different genes (in raws)
structure(list(Gene = structure(1:2, .Label = c("A", "B"), class = "factor"), X_T0_R1 = c(1.46559502, 0.220140568), X_T0_R2 = c(1.087642983, 0.237500819), X_T0_R3 = c(1.424945196, 0.21066267), X_T1_R1 = c(1.289943948, 0.207778662), X_T1_R2 = c(1.376535013, 0.488774258), X_T1_R3 = c(1.833390311, 0.182798731), X_T2_R1 = c(1.450753714, 0.247576125), X_T2_R2 = c(1.3094609, 0.390028842), X_T2_R3 = c(0.5953716, 1.007079177), X_T3_R1 = c(0.7906009, 0.730242116), X_T3_R2 = c(1.215333041, 1.012914813), X_T3_R3 = c(1.069312467, 0.780421013), Y_T0_R1 = c(0.053317766, 3.316414959), Y_T0_R2 = c(0.506623748, 3.599442788), Y_T0_R3 = c(0.713670106, 2.516735845), Y_T1_R1 = c(0.740998252, 1.444496448), Y_T1_R2 = c(0.648231834, 0.097957459), Y_T1_R3 = c(0.780499252, 0.187840968), Y_T2_R1 = c(0.35344654, 1.190274584), Y_T2_R2 = c(0.220223951, 1.367784148), Y_T2_R3 = c(0.432856978, 1.403057729), Y_T3_R1 = c(0.234963735, 1.232129062), Y_T3_R2 = c(0.353770497, 0.885122768), Y_T3_R3 = c(0.396091395, 1.333921747), Z_T0_R1 = c(0.398000559, 1.286528398), Z_T0_R2 = c(0.384759325, 1.122251177), Z_T0_R3 = c(1.582230097, 0.697419716), Z_T1_R1 = c(1.136843842, 0.804552001), Z_T1_R2 = c(1.275683837, 1.227821594), Z_T1_R3 = c(0.963349308, 0.968589683), Z_T2_R1 = c(3.765036263, 0.477443352), Z_T2_R2 = c(1.901023385, 0.832736132), Z_T2_R3 = c(1.407713024, 0.911920317), Z_T3_R1 = c(0.988333629, 1.095130142), Z_T3_R2 = c(0.618606729, 0.497458337), Z_T3_R3 = c(0.429823986, 0.471389536)), .Names = c("Gene", "X_T0_R1", "X_T0_R2", "X_T0_R3", "X_T1_R1", "X_T1_R2", "X_T1_R3", "X_T2_R1", "X_T2_R2", "X_T2_R3", "X_T3_R1", "X_T3_R2", "X_T3_R3", "Y_T0_R1", "Y_T0_R2", "Y_T0_R3", "Y_T1_R1", "Y_T1_R2", "Y_T1_R3", "Y_T2_R1", "Y_T2_R2", "Y_T2_R3", "Y_T3_R1", "Y_T3_R2", "Y_T3_R3", "Z_T0_R1", "Z_T0_R2", "Z_T0_R3", "Z_T1_R1", "Z_T1_R2", "Z_T1_R3", "Z_T2_R1", "Z_T2_R2", "Z_T2_R3", "Z_T3_R1", "Z_T3_R2", "Z_T3_R3"), class = "data.frame", row.names = c(NA, -2L))
I want to average reach 3 replicates (R1,2,3) for each time point (T0,1,3,5) and make a new matrix and then create a heat map with dendrogram.
How can I average each 3 replicates of each genotype at particular time point and make the new matrix? is this possible with tydr?
It's possible in tidyr/ggplot but it's awful. Just convert your data to a matrix and use heatmap.2 or complexHeatmap. I wouldn't normally average my replicates before making a heatmap though.
If you do not mind, could you please explain this up to the matrix, I know how to follow the rest
@OP: you may want to furnish example data and expected output.
Create a matrix with a row for each feature and a column for each sample, and put the values from your dataframe into the matrix. Then call
[Heatmap][1]()
or[heatmap][2].2
on the matrix. You can make the heatmap tiling in ggplot2 using geom_tile, but it's very difficult to overlay a dendrogram above such an image so you're better using Heatmap or heatmap.2