I have an expression matrix of intensities (7216 x 100) I would like to plot using the geom_line() function of ggplot.
this is what I tried:
pcaHC <- hclust(dist(sample.mat), method = "ward.D2") # calculate the distances and cluster
pca_subclusters <- cutree(pcaHC, k=40) # create 40 different clusters
sample_file_df <- data.frame(sample.mat, "cluster" = factor(pca_subclusters)) # merge the clusters with the intensity matrix
the df looks like that:
> head(sample_file_df[,c(1:3,100:101)])
X1 X2 X3 ... X100 cluster
15S_rRNA 47.00252 52.46925 57.51065 ... 133.99373 1
21S_rRNA 11.61435 13.90566 12.74778 ... 113.34820 1
HRA1 72.86330 71.72579 71.66715 ... 94.78852 2
ICR1 55.72980 62.21363 53.49190 ... 68.34249 3
LSR1 202.86542 221.03463 221.87639 ... 307.33516 4
NME1 289.14436 289.17267 291.15432 ... 367.86647 4
Now I have the matrix of intensities with the cluster number merged into it.
I would like to plot the intensities using the geom_line()
parameter of ggplot2
. and using the facet() option to separate the data based on the clusters.
I know how to melt
the data into form without the clusters.
bin <- colnames(sample_file_df[,1:100])
intensities <- t(sample_file_df[,1:100])
df <- data.frame(bin, intensities)
d.f2 <- melt(df[,1:10], id.vars = "bin")
But is there a way to include the information about the clusters in the melted table so that i will be able to separate them based on clusters?
my code:
example <- dput(head(sample_file_df[,c(1:3,101)]))
structure(list(X1 = c(47.0025219774636, 11.61435429513, 72.8633017362537,
55.7297975392345, 202.865415753006, 289.14435756511), X2 = c(52.4692503895184,
13.9056586769545, 71.7257899110431, 62.2136287826649, 221.034632464551,
289.17266718698), X3 = c(57.5106531481446, 12.7477809541531,
71.6671538520602, 53.4918969402706, 221.876393120142, 291.154317537268
), cluster = structure(c(1L, 1L, 2L, 3L, 4L, 4L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
"36", "37", "38", "39", "40"), class = "factor")), .Names = c("X1",
"X2", "X3", "cluster"), row.names = c("15S_rRNA", "21S_rRNA",
"HRA1", "ICR1", "LSR1", "NME1"), class = "data.frame")
bin <- colnames(example[,1:3])
intensities <- t(example[,1:3])
df <- data.frame(bin, intensities)
d.f2 <- melt(df, id.vars = "bin")
ggplot(d.f2, aes(bin, value, group = variable, colour = variable)) + geom_line()
Ideas would be appreciated. Thanks
I have found out that I can merge the two table together based on the gene names and add the clusters, but is there a more efficient method?
If you want the clusters in the melted data frame, don't leave them out of the original data frame.
This doesn't work for me (AFAIK). The clusters are in a column. When I transpose the data to fit the structure I need, they will also become a row in the new matrix and I won't be able to melt them accordingly.
Or do I miss something?
Maybe:
bin
is a column andcluster
in this case would be a row in the data.frame. I don't see how to combine these info together.Cluster is not a row according to your example of head(sample_file_df[,c(1:3,100:101)]) above. I didn't check what bin was. Replace it by the gene name column of sample_file_df. The idea is that you can give more than one column to melt.