Gene expression data set scaling
1
0
Entering edit mode
7.3 years ago
1769mkc ★ 1.2k

I have rna seq data , I trying to make heatmap but there are certain values which are quite low the lower range goes to some extent ,meanwhile the upper range is reasonable .So im trying to scale the data.Now my question if im scaling the data would it preserve the true biological meaning because when i plot scaled data vs the data thats not scaled i see a quite a difference

Any suggestion would be highly appreciated

RNA-Seq R • 5.8k views
ADD COMMENT
0
Entering edit mode

I am not sure if I understand your question right, although a colleague had once scaled the color range of the heatmap using a log scale. He plotted a histogram of the FPKM values to decide the color range.

P.S. I am not sure if the biological sense is retained, but I presume it should still make sense.

ADD REPLY
0
Entering edit mode

well i have values in order of -50 ,-60 i certainly dont want to put that in heatmap but at the same time i want to retain those differences .Im using pheatmap it doesn;t have the heatmap.2 kind thing where you can plot the histogram or density sort of thing .

ADD REPLY
0
Entering edit mode

RPKM or FPKM values are according to me relative in a linear scale. For example, if your values range from -50 to 500, you could make -50 relative to 0. In that case, your values on the heatmap would be 0-550, but I'm not sure if it would be considered data manipulation. I would wait for someone to reply on that.

ADD REPLY
0
Entering edit mode

thats true but it may be noise as well so im not sure..

ADD REPLY
0
Entering edit mode

Yes, that certainly is a possibility you can't rule out. I would however like to know how did you get readcounts in the negative scale, and what that means.

ADD REPLY
1
Entering edit mode

well those are all FPKM values so i have like 5 cell type data i think most likely they are getting highly down-regulated perhaps unless its noise

ADD REPLY
3
Entering edit mode
7.3 years ago

Presuming your genes are in rows and your columns are samples then scaling rows will preserve the biology within genes. If you do clustering then that will change, but that's typically less of an issue.

ADD COMMENT
0
Entering edit mode

yes my genes are in rows , and sample in columns. How do i scale column with the
is it scale(df) or something else ?

ADD REPLY
0
Entering edit mode

How you want to scale things is completely up to you.

ADD REPLY
0
Entering edit mode

Hi Devon! I read your comment and I am now a bit unsure of the type of scaling that I need to perform on my data. If I have genes as rows and samples in columns and the intention is to perform a clustering on samples, should scaling be done on column or on rows? I appreciate if you could explain this to me. Thanks.

ADD REPLY
0
Entering edit mode

Generally you want to scale things such that highly-expressed genes aren't driving the clustering, which would mean by rows. However, you can also do things like vst() in DESeq2 to put things on a more useful scale to begin with.

ADD REPLY
0
Entering edit mode

Thanks for the answer Devon! however, when I do gene-wise scaling on all genes will have sd=1. Thus, if I am not mistaken, significance of genes will be lost. So I am a little bit unclear what criteria for patients classification into potential subgroups of a cancer classification algorithms will use?
By the way, my gene expression data is derived from microarrays which are on log2 scale.

ADD REPLY
1
Entering edit mode

Then you're not clustering, you're classifying, which is completely different. You should post such things as a new question.

ADD REPLY
0
Entering edit mode

Hi @Devon, I normalized my read counts using vst() and I want to do kmeans clustering for my samples. Based on your comments you mean no need to scale my data after vst normalization? I'll appreciate your help!

ADD REPLY

Login before adding your answer.

Traffic: 1928 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6