Hi..
I have differential gene expression profiles after 164 drug applied (some replicates are there i.e same drug applied twice or thrice). The dimension of my data.frame is 22268 * 453.
I thought to do hierarchical clustering to see if it can make some clusters of drugs producing similar gene expression profile but the heatmap I have got is so confusing that I cannot get anything out of it. It is a 45 MB pdf and loads so slowly.
Can anybody guide me how to interpret such a big data heatmap?
One thing else is let's say if I compare a new drug-treated gene expression profile (dim 22268 * 1) with my old heatmap, is it possible that I can get the information to which cluster this newly queried expression belong. The column names in my data are drugs applied, columns contain gene expression values and rows are the probe names.
Hint: you don't need all 22k+ rows
I think some papers talk about this like: "Raw microarray data were subjected to quality control and preprocessing procedures to improve data consistency and reduce batch effects (Iskar et al, 2010). For CMap, this resulted in a usable set of expression measurements of 8964 genes in three human cell lines (HL60, MCF7 and PC3)". But I don't get it how they do that...!