I have RNA-seq data with 2 conditions and 3 replicates per conditions.
I ran the New Tuxedo pipeline and also created some read count tables with prepDE.
I analysed differentially expressed genes with Ballgown
and DESeq2
.
With a treshold of 1 log2FoldChange and 0.01 padj in DESeq2
: 14400 /32000 (45%) of DE genes
With a treshold of 0.01 pval in Ballgown
: 3678/32000 (of DE genes), even with no fold change treshold, the number of DE genes is (very) lower. In ballgown, what is the difference between qval and pval ? Which one corresponds to padj in DESeq2 ?
I expect many DE genes as conditions are very different biogically (testis vs ovary, same species).
Why do I have a so large difference between softwares?
Two things: 1) two completely different statistical frameworks, 2) two different pval cutoffs. Hope the one you used for Ballgown is FDR-adjusted. if so, why 0.05 there and 0.01 in DESeq?
I agree for point 1), but even if the statistical methods are different I expect approximately the same results no ?
For point 2) I edit my post thanks !
Hi, I am wondering if you have solved this issue. I am getting the same problem that ballgown gave significant less DF genes compared to DESeq2. It might not be the "FPKM" as my tophat-cufflink-cuffdiff produces the similar result as DESeq2. Thank you!@
Please use
Add comment
rather than the answer field for comments. Is there any specific reason you use ballgown rather than DESeq2 or edgeR?corend, if you could follow up with ATpoint, that would be great. Also, one should never expect that these programs produce the same results.
In addition (and sorry if I revive a zombie-post), I think that the correct way to perform the analysis with ballgown would be to set
libadjust
to FALSE, otherwise you will get FPKM (which already normalize somehow for the success of a sequencing run) that are then scaled as (quoting the manual) "the sum of the sample’s log expression measurements below the 75thpercentile of those measurement".