Big dataset how to plot gene expression differences?
0
0
Entering edit mode
4.1 years ago
camillab. ▴ 160

Hi,

I have a big dataset with 51 samples (in columns) and more than a 1000 genes (in rows). Which plot should I use? Is there any way I can plot differences in gene expression (= reads not log2FC) across the different samples? the only solution I thought of so far is to plot each gene as barplot but it takes ridiculous amount of time.

thank you in advance

Camila

R plot dataset • 1.1k views
ADD COMMENT
1
Entering edit mode

Sounds like the realm of heatmaps, using the standardized (Z-scored) expression values. Is that an option?

ADD REPLY
0
Entering edit mode

Is it a good idea to plot 1000 genes in a plot?

ADD REPLY
0
Entering edit mode

Why not? It is then about global patterns rather than individual genes obviously.

ADD REPLY
0
Entering edit mode

what would you suggest instead?

ADD REPLY
0
Entering edit mode

I did it but I wasn't really convinced by the results but I guess it's the only way to go

ADD REPLY
0
Entering edit mode

Why not? becuase of the plot size! Plotting this number of genes could possibly affect the message you want to convey by the plot. There should be way to select informative ones (DE or variabbly expressed genes) for plotting. @camillab, from wher these genes are comming? Did you performed DE or ...?

ADD REPLY
0
Entering edit mode

I did hierarchical cluster analysis and I found that a group samples (= replicates for the same condition) do not cluster as expected so I extracted the genes that are highly specific for that cluster / more dissimilar in those samples compared to the other and I wanted to plot them to see how they look like compared to the others. Not sure if it is the right approach but I thought it was interesting to try to figure out which genes contribute for the clustering. I have already run the PCA but the find "loading", it confuses me more than anything else since each gene will have a contribution for each PCs.

ADD REPLY
0
Entering edit mode

Sure, you have to select genes that are informative, but that is pretty much the only problem. It is not uncommon to get > 1000 DEGs depending on how strong the phenotype is. Scale logcounts to the Z-scale, then apply hcllust and plot the heatmap. Pretty straight-forward to explore patterns in the data across groups.

ADD REPLY

Login before adding your answer.

Traffic: 1559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6