Question

What is the difference between log2(Mutant/Wildtype) or log2(Wildtype/Mutant) from cuffdiff output?

0

Entering edit mode

7.9 years ago

bioinforesearchquestions ▴ 370

Hello folks,

I have the excel file generated from Cuffdiff output for genes with the following columns

Gene, locus, sample_1, sample_2, status, value_1, value_2, log2(fold_change), test_stat, p_value, q_value, significant

Casp1, chr9:5298516-5307281, MUT, WT, OK, 123.019, 0.671358, -7.51758, 6.17607, 6.57E-10, 6.36E-07, yes

As per the excel file, sample_1 is Mutant and sample_2 is Wildtype. Log2(fold_change) is calculated as log2(sample_2/sample_1) --> log2(0.671358/123.019) is -7.51758 .

I thought it should be log2(final/initial), isn't it?

What is the difference between log2(Mutant/Wildtype) or log2(Wildtype/Mutant)?

How to show the fold change (like 4-fold, 5-fold change) in heatmap?

RNA-SEQ heatmap cuffdiff • 3.9k views

ADD COMMENT • link updated 7.9 years ago by Renesh ★ 2.2k • written 7.9 years ago by bioinforesearchquestions ▴ 370

score 3 · Answer 1 · 2017-10-02

3

Entering edit mode

7.9 years ago

Renesh ★ 2.2k

In cuffdiff output, value_1 and value_2 is for control and experimental conditions. Cuffdiff calculates fold change as log2(value_2/value_1), meaning how much gene expression changes in experimental condition over control.

Further, it depends on which condition you have given as control and experimental while running cuffdiff. You need to share the command that you used for running cuffdiff.

What is the difference between log2(Mutant/Wildtype) or log2(Wildtype/Mutant)?

In log2(Mutant/Wildtype) , wildtype is value_2 and mutant is value_1. It will calculate expression changes in wildtype over mutant. Reverse is valid for log2(Wildtype/Mutant)

How to show the fold change (like 4-fold, 5-fold change) in heatmap?

You can use the color intensity/gradient scale to show the proportion of fold change in the heatmap. See heatmap.2 package in R.

ADD COMMENT • link 7.9 years ago by Renesh ★ 2.2k

0

Entering edit mode

Thanks, Renesh for the explanation.

I just have the excel file with me. Cuffdiff analysis was done by someone else. I believe he/she might have mistakenly given mutant as "CONTROL" and wildtype as "EXPERIMENTAL".

ADD REPLY • link 7.9 years ago by bioinforesearchquestions ▴ 370

1

Entering edit mode

Do not make assumptions as an analyst (unless you have first-hand knowledge of the experimental details). You should verify your suspicion with the person who gave you analysis or people who generated the data.

ADD REPLY • link 7.9 years ago by GenoMax 153k

0

Entering edit mode

Hi genomax, that person moved out of the lab long time back. Only excel file has been provided by that person. I don't have the proper log of the analysis. Thanks for cautioning me about it.

ADD REPLY • link 7.9 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

Hi Genomax,

Does the order of the samples entered in Cuffdiff command impact the results of the differential gene expression?

b'cos Cuffdiff considers Label1 as sample1 and Label2 as sample2. So is it a hidden norm to mention the control always as LABEL1 and experimental/mutant as LABEL2?

ADD REPLY • link 7.9 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

Okay, but be careful with this. You should get the analysis code from that person. If the labeling is done incorrectly, it will completely reverse your results.

ADD REPLY • link 7.9 years ago by Renesh ★ 2.2k

0

Entering edit mode

Hi Renesh,

This is the code I have for another analysis for Lymph and spleen together from the person who did the below analysis as well,

cuffdiff -o diff_out_Lymphnodes_Spleen -b mouse_mm10.fa -p 12 -L Mutant_Lymph_Spleen,WT_Lymph_Spleen -u merged_asm/merged.gtf Mutant_Lymph/accepted_hits.bam,Mutant_Spleen/accepted_hits.bam Wildtype_Lymph/accepted_hits.bam,Wildtype_Spleen/accepted_hits.bam

I am currently working on the analysis of Lymph nodes, based on the above labeling I suspect that

cuffdiff -o diff_out_Lymphnodes -b mouse_mm10.fa -p 12 -L Mutant_Lymph,WT_Lymph -u merged_asm/merged.gtf Mutant_Lymph/accepted_hits.bam Wildtype_Lymph/accepted_hits.bam

ADD REPLY • link 7.9 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

From the code above, you have given mutant as control and wildtype as experimental. If you want to look for genes up-regulated in response to mutant, you need to see negative fold change and vice versa for down-regulated genes.

ADD REPLY • link 7.9 years ago by Renesh ★ 2.2k

score 1 · Answer 2 · 2017-10-02

1

Entering edit mode

7.9 years ago

Kevin Blighe 89k

If, for GeneX, Sample1's expression is 20 and Sample2's expression is 5, then:

log2(Sample1/Sample2) = 2

We can make the following statement: Sample1 has higher expression than Sample2 for GeneX

log2(Sample2/Sample1) = -2

We can make the following statement: Sample2 has lesser expression than Sample1 for GeneX

Both statements are implying the same thing. You can see, however, that the choice of nominator and denominator is important.

This should not be important for the heatmap. If a gene has higher expression than another in a particular sample, then the heatmap will have a shading representative of the higher level (e.g. it will be red if your colour scheme goes from green-to-black-to-red for low-to-normal-to-high expression). The heatmap function in R will usually scale your data to the Z scale for the purposes of heatmap colour-shading, thus, we are then referring to standard deviations from the mean, as opposed to fold-changes. You can switch off this function of the heatmap and transform the data in your own way using the following commands:

myBreaks <- seq(-3, 3, length.out=101)
heat <- t(scale(t(MyDataMatrix)))
heatmap.2(..., breaks=myBreaks, scale="none")

ADD COMMENT • link 7.9 years ago by Kevin Blighe 89k

0

Entering edit mode

Thanks, Kevin. In general, won't they consider (final/initial) in wet-lab.

As you mentioned, inorder to get the Z-scale for the heatmap, I log transformed the FPKM values.

> FPKM <-read.table("dataset.csv",sep=",", header=TRUE, row.names=1)

> nrow(FPKM) [1] 21

> FPKM_log10 <- log10(FPKM+1)

> head(FPKM)

           MUT_Lymph      WT_Lymph

Actn1  29.42360  92.09680

Ccr7   61.08610 177.92300

Ctla4  53.45150  10.96750

Dapl1  43.16620 140.23300

> head(FPKM_log10)

            MUT_Lymph      WT_Lymph

Actn1  1.4832106 1.9689348

Ccr7   1.7929944 2.2526662

Ctla4  1.7360098 1.0780034

Dapl1  1.6450900 2.1499362

>Log_data_matrix <- data.matrix(FPKM_log10)

> heatmap.2(Log_data_matrix,scale="row", col=greenred, trace="none",margins = c(5,5),cexRow=0.7,cexCol=0.7,dendrogram='both',Rowv=TRUE,Colv=TRUE,reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean), distfun=function(x) as.dist(1-cor(t(x))), hclustfun=function(x) hclust(x, method="complete"))

Inorder to just get the fold-change of mutant and wildtype in the heatmap, I believe that I should use the FPKM instead of log-transformed FPKM.

ADD REPLY • link 7.9 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

Yes, I would just use the FPKM values for the heatmap. The heatmap function will scale these itself and they will be transformed into Z scores.

Also take a look at my colleague's answer below.

ADD REPLY • link 7.9 years ago by Kevin Blighe 89k

0

Entering edit mode

Just use your FPKM data-matrix for the heatmap. Do not use the log-transformed one