Tissue Specificity index (tau) should FPKM data be log transformed?
1
1
Entering edit mode
8.3 years ago
microfuge ★ 1.9k

Hi, I would like to know if the FPKM data should be log transformed to calculate Tissue specificity index (tau) ?. Microarray data is. But taking a log of FPKM would result in negative values. Here They discard FPKM <1 so that no negative values are generated. I don't think normality is an assumption for data behind tau ? or in general if a log transformation of expression values a necessity for tau ?.

Many Thanks!

FPKM tau RNA-Seq • 5.1k views
ADD COMMENT
1
Entering edit mode
8.2 years ago
BioinfGuru ★ 2.1k

Hi,

I'm also tackling this problem. Have you found an answer? What I have decided to do is as follows:

• Set all genes with <1RPKM as not expressed

• First normalise all RPKM by log transformation

• Then calculate a mean (not to sure about doing mean here...I dont like mean) value for all replicates for each tissue

• Remove any genes not expressed in any tissue

• Use Tau to calculate tissue specificity

• To identify a gene as tissue specific, use a threshold near but before the second of the bimodal peak (e.g. 0.8 used in 2016 - benchmark paper you cite above)

Feedback would be great.

Thanks Kenneth.

ADD COMMENT
0
Entering edit mode

So sorry for the delay I somehow lost this thread. I finally used array based data as it covered more tissues. Did not have time to check this further. But I compared log transformed vs raw FPKM data and the log transformation seem to change value of tissue specificity index a lot.

ADD REPLY
0
Entering edit mode

Yes that is expected according to the benchmark paper:

"removing logtransformation has a greater influence on all parameters, in the direction of detecting more tissue-specificity, sometimes losing completely the signal of broad expression e.g. Tau"

But does risk increasing false discovery? - they dont say but they do follow with:

"Tau... show[s] the highest correlations between normalized and non-normalized data" - sounds good but when you see the correlation values in supp data...I dont think the correlation is high enough

and "in the absence of logtransformation, the correlations between subsets of tissues or between species are in general weaker" - so definately dont use raw if using a small set of tissues (my gues is <10 tissue types) and u are working on data that comes from an animal model (e.g. mouse data.hoping to represent human)

ADD REPLY

Login before adding your answer.

Traffic: 2684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6