I have an RNA seq data with 4 replicates,I have used DESEQ to call differential expression, however when I plot he heat map it has some variation with in replicate (which is natural), however the heat map does not look good and obvious. It is mouse data. What are my options, I dont want to show just average/ median of group. Any suggestion/ thoughts?
Real data is always messy. The question is whether this prevents you from reaching a conclusion. If so you'll just have to design and do a better experiment. If you can reach a conclusion with this data then you should present the data in a way that supports this conclusion. Heatmaps can be useful for some bird's eye view of the data but they may not be the best way of presenting supporting evidence. Think about what it is you want to say about the data and what would be the appropriate graphical representation. Typically, replicates are used to get an estimate of a central tendency (e.g. mean) when experiments are noisy. We're rarely interested in the behavior of individual replicates (an exception maybe trying to identify outliers). If you want more precise advice, you need to tell us more about you're trying to achieve.
Ok. I want to show that my small molecule help in treatment of a disease over the time. I have first 5 replicates of each control mice and disease induced mice. I selected around 500 genes first that differentially expressed in disease. Then I treated group of mice (4 in each group) for a period of 1 day 4 day and 6 days along with control mice injected with PBS. Did RNA seq on all samples and performed differential analysis in disease vs control animals to identify disease related genes. Then I identified differentially regulated genes on day 1, 4 and six over their control PBS. Finally I am using subset of genes that have been identified in disease vs control (500) to show relationship over time. Now as we can understand there are only few genes expressed in day 1 but a lot in day 4 and little less in day 4. Deseq was used. So I have two points that I want make- 1. To show that of the 500 disease genes XX number of genes in (group replicates) are differentially expressed at day 1. 2. Overall presentation of data for disease vs control grp; day 1, 4 and 6 data of PBS vs small molecule treated sample (with replicate of sample). I can limit to highest number of genes in a time point e.g 200 on day 4. Please note that disease model (disease vs control) are tight replicate, the problem is with replicates of treated vs PBS samples on days 1, 4 and 6 some of the animals don’t show a consistent pattern in group. Thanks for your suggestion
Neither point 1 nor point 2 requires looking at the replicates individually. The replicates are for dealing with the noise. Now the variation between replicates might be such that the central tendency between groups may be nearly the same. If you trust that your replicates are samples from the same underlying distribution then you have to trust that the central tendency you compute tells you something about the underlying distribution. Otherwise, you have to assume that some replicates are artefacts/outliers and the question is: can you identify them ? For example, are some of the mice male and others female and can you find differences between male and female in the data ? Or were some mice subjected to conditions that would justify removing them from the study ? Or maybe the variability is part of the response to the treatment. Point 1 doesn't need a heatmap. You could just plot the fraction of disease genes that you find differentially-expressed at each time point. In point 2, you're interested in what happens to the disease genes after treatment so I would suggest using the median as it is more robust to noise than the mean. If you're not interested in the actual values, you could also use a binary system in which a heatmap cell shows up/down if a majority of the replicates have a value above/below some thresholds. You could even represent the half-and-half cases in this way.