Hello everyone,
I am very new to GO analysis and would like to know the best way to display my data visually and the best tools to use for the job. I have searched through some of the topics here already and they have helped me figure out some things and give me some context but I am still quite confused so your patience is appreciated.
My data: Gene expression data from mass spec and microarray (~400 unique UNIPROT_ID). There are three different experimental conditions for the data and in each one there is an experimental and control sample. I have normalized the data in each of these three lists to their respective controls and log2 transformed the data. So my data is essentially three gene lists for each conditions and an log2 expression value for each gene represented in each condtion.
What I want: For each experimental condition, I would like to see what GO terms the different gene names cluster in. So if there are a set of genes associated with a particular disease ontology (especially important!), biochemical pathway, or cellular function, they would cluster together in a dendogram or some kind of tree and the associated expression level for that gene in that cluster could be seen. I also want to compare how a certain gene changes expression between the different experimental conditions which brings me to the next point.
The problem: I had thought that a GO-clustered heat map would be what I wanted but the problem is that since not all genes appear in each experimental condition, if I were to have the full gene list clustered and display a heat map for all 3 conditions, I would get a lot of empty cells when the gene does not appear in one of the conditions. I have never seen heat maps presented this way so surely this is not the best way to show all three conditions against each other. What would be the best way to present my data? The alternative seems to be to have three separate heat maps for each condition but I think it will be difficult to compare them (visually) when they are side-by-side.
GO Tools: My PI has expressed interest in using DAVID for clustering because of the large number of paper citations it has. But if there is a better choice for my purposes I would be open to suggestions. I suspect that I need to use the DAVID functional clustering tool and then prepare the output for some kind of software that can read/interpret the clustering as a dendogram and integrate my expression values for each gene in the clusters. However, I don't even know the names of things to search for to answer my question so this is making things very difficult. Furthermore, when I search with my gene list in DAVID, nothing clusters in disease ontology and this is what I am most interested in. I have attempted to prepare DAVID data for Java TreeView but I am having difficulty preparing it in a format that TreeView finds acceptable.
Main questions: 1. Best way to present my data visually? 2. Best tools to use to accomplish this task?
Any sort of basic direction would be greatly appreciated, I feel very lost and I don't really know where to begin with all of this. Thank you!
Thank you! I'll return to this here in a little while once I get familiar with other ways to represent my data. I know a bit of R but not enough to do much with it so that page you linked (as well as the corresponding flowchart on that page) will be very helpful to me. And sorry about putting my last comment as an answer instead, my mistake.
Thank you all for your time.