Entering edit mode
6.7 years ago
Arindam Ghosh
▴
530
The gene FPKM values as extracted by ballgown gexpr() are:
Gene_id A.1 A.2 B.1 B.2 A.3 A.4 B.3 B.4
ENSG00000252139 13.225 0.135 0.203 0.288 0.117 0.129 18.956 32.179
ENSG00000131914 624.105 644.104 0.594 0.480 673.398 654.474 16.161 14.910
where A and B are two groups and 1/2/3/4 are replicates of each group. I used stattest to find DEG and the result for these two genes are:
id feature fc pval qval
ENSG00000252139 gene 1916267.358 0.002611388 0.022444352
ENSG00000131914 gene 1865398.512 0.000572501 0.009995431
The fold-change (FC) is abnormally high. If FC = Group A / Group B than the value cannot be such high. Can anyone explain about the method for the calculation of fold change. I have read the paper and it's not totally clear. This is not just for two gene. Or where might have I gone wrong?
Can you post the exact command you used to produce that? I totally agree with you that those results look highly suspicious.
For finding DGE:
To extract gene expressions:
I presume that cell relates to your groups A and B?
Is that a linear or log fold-change that you're displaying? The log2s of your FC values are ~20
The co-variate cell contains the groups A and B. Basically I have sample from two type of cells - A and B.
I am not sure about the FC values. Yes the log2 values is 20. But why the value is so high when represented without log? That's the exact thing I am seeking answer for.
2^20 is a very very large value (~1 million). That value as a change seems unreasonably high unless the gene went from unexpressed to expressed (at which point fold-changes aren't terribly meaningful).
It does seem to have gone from unexpressed to expressed, if we see the first table for gee FPKM above. But still if we take simple ratio of the mean of two values, it shouldn't have been so high.
Agreed, though I presume that the FPKMs are not what's actually used in the statistical test.
Then which values might have been taken into account?
Presumably this is described in the ballgown paper and they're using some sort of robustly normalized count.
Presuming that the things are still going as scripted, should I proceed with my results? I am bit extra cautious as this is the first time I am working with NGS data. Suggestions are definitely welcome.
I think those values are unreasonably high, I wouldn't proceed until I believed them.