Entering edit mode
6.4 years ago
Vasu
▴
790
Hi,
I have a matrix with samples as rows and Genes as columns with gene expression values (RPKM).
Following is an Example data. Original data has more than 800 samples.
LINP1 EGFR RB1 TP53 CDKN2A MYC
Sample1 0.02 0.038798682 0.1423662 2.778587067 0.471403939 18.93687655
Sample2 0 0.059227225 0.208765213 0.818810739 0.353671882 1.379027685
Sample3 0 0.052116384 0.230437735 2.535040249 0.504061015 9.773089223
Sample4 0.06 0.199264618 0.261100548 2.516963635 0.63659138 11.01441624
Sample5 0 0.123521916 0.273330986 2.751309388 0.623572499 34.0563519
Sample6 0 0.128767634 0.263491811 2.882878373 0.359322715 13.02402045
Sample7 0 0.080097356 0.234511372 3.568192768 0.386217698 9.068928569
Sample8 0 0.017421323 0.247775683 5.109428797 0.068760572 15.7490551
Sample9 0 2.10281137 0.401582013 8.202902242 0.140596724 60.25989178
To make a scatter plot showing correlation between two genes I used ggscatter
ggscatter(A2, x = "LINP1", y = "RB1",
add = "reg.line", conf.int = FALSE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "LINP1", ylab = "RB1")
The scatter plot looks like this LINP1 and RB1. Why the points are not in the direction of regression line? Do I need to change the scales of x-axis and y-axis?
And I want to make a scatter plot like this scatterplot Fig 2g in this Research paper. where it LINP1 expression is showed against all other genes in a single plot. Is it possible with any code?
use log scale, or log2 transform your data.
Oh ya thanks. just now I see that
ggscatter
has arguments xscale and yscale. So, Im giving it like this xscale="log2", yscale="log2". And any idea about the plot shown in the paper?The plot from the paper can be made with
par(mfrow=c(1,8))
before you make the 8 plots, make them without axes and create the axes afterwards. See?plot
for more details.The issue in the plot that you've shown is with the distribution of data-points for your LINP1 gene, i.e., the vast majority of points are either zero or close to it. The gene's distribution is heavily skewed, and this, in turn, 'confuses' whatever statistical methodology you apply to this data.
Thus, your Pearson correlation values are misleading and not valid.
You may consider logging the data (Edit: as noted by my colleague b.nota, too)
yes thanks. I saw that just now. Any idea about the plot in the paper?