How to calculate fold change?
0
1
Entering edit mode
6.7 years ago
Arindam Ghosh ▴ 530

The gene FPKM values as extracted by ballgown gexpr() are:

Gene_id           A.1       A.2     B.1     B.2     A.3     A.4     B.3      B.4
ENSG00000252139   13.225    0.135   0.203   0.288   0.117   0.129   18.956   32.179
ENSG00000131914   624.105   644.104 0.594   0.480   673.398 654.474 16.161   14.910

where A and B are two groups and 1/2/3/4 are replicates of each group. I used stattest to find DEG and the result for these two genes are:

id              feature fc          pval            qval    
ENSG00000252139 gene    1916267.358 0.002611388     0.022444352
ENSG00000131914 gene    1865398.512 0.000572501     0.009995431

The fold-change (FC) is abnormally high. If FC = Group A / Group B than the value cannot be such high. Can anyone explain about the method for the calculation of fold change. I have read the paper and it's not totally clear. This is not just for two gene. Or where might have I gone wrong?

RNA-Seq ngs statistics R • 4.9k views
ADD COMMENT
0
Entering edit mode

Can you post the exact command you used to produce that? I totally agree with you that those results look highly suspicious.

ADD REPLY
0
Entering edit mode

For finding DGE:

pheno_data = read.csv("compare.csv")
bg = ballgown (dataDir = "/home/bioinfo/Documents/Data", samplePattern = "SRR", pData = pheno_data)
bg_table = texpr (bg, 'all')
bg_gene_name = unique (bg_table[,9:10])
save (bg, file = 'bg.rda')
result_genes = stattest(bg, feature = "gene", covariate = "cell", getFC = TRUE, meas = "FPKM")
result_genes = merge (result_genes, bg_gene_name, by.x = c("id"), by.y = c ("gene_id"))
bg_filt = subset (bg, "rowVars(texpr(bg))>1", genomesubset = TRUE)
bg_filt_table = texpr (bg_filt, 'all') 
bg_filt_gene_names = unique (bg_filt_table [,9:10])
result_genes = stattest(bg_filt, feature = "gene", covariate = "cell", getFC = TRUE, meas = "FPKM")
result_genes = merge (result_genes, bg_filt_gene_names, by.x = c("id"), by.y = c ("gene_id"))

To extract gene expressions:

gexpr(bg)
ADD REPLY
0
Entering edit mode

I presume that cell relates to your groups A and B?

Is that a linear or log fold-change that you're displaying? The log2s of your FC values are ~20

ADD REPLY
0
Entering edit mode

The co-variate cell contains the groups A and B. Basically I have sample from two type of cells - A and B.

I am not sure about the FC values. Yes the log2 values is 20. But why the value is so high when represented without log? That's the exact thing I am seeking answer for.

ADD REPLY
0
Entering edit mode

2^20 is a very very large value (~1 million). That value as a change seems unreasonably high unless the gene went from unexpressed to expressed (at which point fold-changes aren't terribly meaningful).

ADD REPLY
0
Entering edit mode

It does seem to have gone from unexpressed to expressed, if we see the first table for gee FPKM above. But still if we take simple ratio of the mean of two values, it shouldn't have been so high.

ADD REPLY
0
Entering edit mode

Agreed, though I presume that the FPKMs are not what's actually used in the statistical test.

ADD REPLY
0
Entering edit mode

Then which values might have been taken into account?

ADD REPLY
1
Entering edit mode

Presumably this is described in the ballgown paper and they're using some sort of robustly normalized count.

ADD REPLY
0
Entering edit mode

Presuming that the things are still going as scripted, should I proceed with my results? I am bit extra cautious as this is the first time I am working with NGS data. Suggestions are definitely welcome.

ADD REPLY
1
Entering edit mode

I think those values are unreasonably high, I wouldn't proceed until I believed them.

ADD REPLY

Login before adding your answer.

Traffic: 2676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6