Entering edit mode
4.1 years ago
camillab.
▴
160
Hi,
I have a big dataset with 51 samples (in columns) and more than a 1000 genes (in rows). Which plot should I use? Is there any way I can plot differences in gene expression (= reads not log2FC) across the different samples? the only solution I thought of so far is to plot each gene as barplot but it takes ridiculous amount of time.
thank you in advance
Camila
Sounds like the realm of heatmaps, using the standardized (Z-scored) expression values. Is that an option?
Is it a good idea to plot 1000 genes in a plot?
Why not? It is then about global patterns rather than individual genes obviously.
what would you suggest instead?
I did it but I wasn't really convinced by the results but I guess it's the only way to go
Why not? becuase of the plot size! Plotting this number of genes could possibly affect the message you want to convey by the plot. There should be way to select informative ones (DE or variabbly expressed genes) for plotting. @camillab, from wher these genes are comming? Did you performed DE or ...?
I did hierarchical cluster analysis and I found that a group samples (= replicates for the same condition) do not cluster as expected so I extracted the genes that are highly specific for that cluster / more dissimilar in those samples compared to the other and I wanted to plot them to see how they look like compared to the others. Not sure if it is the right approach but I thought it was interesting to try to figure out which genes contribute for the clustering. I have already run the PCA but the find "loading", it confuses me more than anything else since each gene will have a contribution for each PCs.
Sure, you have to select genes that are informative, but that is pretty much the only problem. It is not uncommon to get > 1000 DEGs depending on how strong the phenotype is. Scale logcounts to the Z-scale, then apply hcllust and plot the heatmap. Pretty straight-forward to explore patterns in the data across groups.