Question

Ballgown finds few DE genes compared to DESeq

1

Entering edit mode

6.5 years ago

corend ▴ 70

I have RNA-seq data with 2 conditions and 3 replicates per conditions.

I ran the New Tuxedo pipeline and also created some read count tables with prepDE.

I analysed differentially expressed genes with Ballgown and DESeq2.

With a treshold of 1 log2FoldChange and 0.01 padj in DESeq2: 14400 /32000 (45%) of DE genes

With a treshold of 0.01 pval in Ballgown: 3678/32000 (of DE genes), even with no fold change treshold, the number of DE genes is (very) lower. In ballgown, what is the difference between qval and pval ? Which one corresponds to padj in DESeq2 ?

I expect many DE genes as conditions are very different biogically (testis vs ovary, same species).

Why do I have a so large difference between softwares?

RNA-Seq DESeq2 Ballgown • 4.5k views

ADD COMMENT • link updated 6.2 years ago by shenwei1376 • 0 • written 6.5 years ago by corend ▴ 70

1

Entering edit mode

Two things: 1) two completely different statistical frameworks, 2) two different pval cutoffs. Hope the one you used for Ballgown is FDR-adjusted. if so, why 0.05 there and 0.01 in DESeq?

ADD REPLY • link 6.5 years ago by ATpoint 86k

0

Entering edit mode

I agree for point 1), but even if the statistical methods are different I expect approximately the same results no ?

For point 2) I edit my post thanks !

ADD REPLY • link 6.5 years ago by corend ▴ 70

0

Entering edit mode

Hi, I am wondering if you have solved this issue. I am getting the same problem that ballgown gave significant less DF genes compared to DESeq2. It might not be the "FPKM" as my tophat-cufflink-cuffdiff produces the similar result as DESeq2. Thank you!@

ADD REPLY • link 6.2 years ago by shenwei1376 • 0

0

Entering edit mode

Please use Add comment rather than the answer field for comments. Is there any specific reason you use ballgown rather than DESeq2 or edgeR?

ADD REPLY • link 6.2 years ago by ATpoint 86k

0

Entering edit mode

corend, if you could follow up with ATpoint, that would be great. Also, one should never expect that these programs produce the same results.

ADD REPLY • link updated 4.8 years ago by ATpoint 86k • written 6.2 years ago by Kevin Blighe 88k

1

Entering edit mode

In addition (and sorry if I revive a zombie-post), I think that the correct way to perform the analysis with ballgown would be to set libadjust to FALSE, otherwise you will get FPKM (which already normalize somehow for the success of a sequencing run) that are then scaled as (quoting the manual) "the sum of the sample’s log expression measurements below the 75thpercentile of those measurement".

ADD REPLY • link 4.8 years ago by Fabio Marroni ★ 3.0k

score 4 · Accepted Answer · 2018-07-05

4

Entering edit mode

6.5 years ago

Kevin Blighe 88k

pval is the nominal p-value. qval is the adjusted p-value, which are also known as q-values (not many people know this).

Ballgown may be using FPKM data when conducting the differential expression analysis. FPKM is not suitable for this purpose. Please confirm the type of normalisation that you used in Ballgown.

When you ran DESeq2, did you use the lfcShrink() function? - see the piece of code that I posted here: A: DESeq2 Appropriate Settings for Poorly Clustering Samples?

ADD COMMENT • link 6.5 years ago by Kevin Blighe 88k

0

Entering edit mode

I am not sure to understand what do you mean with

may be sing FPKM data when conducting the differential expression analysis

I used this command line in ballgown:

results_genes = stattest(bg, feature="gene",
                         covariate="Tissue", getFC=TRUE,
                         meas="FPKM")

In DESeq2, I didn't use lfcShrink(), but I used betaPrior=TRUE. In your post, if I understand well, you say that lfcShrink() is usefull when replicates badly group ?

ADD REPLY • link 6.5 years ago by corend ▴ 70

1

Entering edit mode

Sorry, that was a type error on my part. I meant to write:

Ballgown may be using FPKM data when conducting the differential expression analysis

Based on the code that you've provided, it is indeed using FPKM. FPKM is not suitable for conducting differential expression analysis, just so you are aware. This may indirectly contribute to the problem that you find.

Oh, the use of lfcShrink() is not confined to cases where replicates group poorly. If you have used betaPrior=TRUE, then do not worry about lfcShrink(), for now. lfcShrink is part of the latest updates to the DESeq2 package.

Apart from everything that I've already mentioned, differences across differential expression analysis tools should be expected.

ADD REPLY • link 6.5 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks a lot, I will keep using DESeq2 for DE analysis and use Ballgown to have an idea of gene expression level in FPKM. You can switch this to an answer!

ADD REPLY • link 6.5 years ago by corend ▴ 70