Hi all,
I have come across a gene, which show a 7 fold increase in Treatment (3 replicates ) vs Control (3 replicates ),while looking at FPKMS.(Took Mean of treament/Mean of Control to get a foldchange)
But doing differential expression(using counts from htseq-count),I am getting a log2foldchange of -0.5 which means the Treatment has lower expression as compared to Controls.
There are couple more such instances that show discordancy between differential expression and FPKMS.
Any suggestions?
Thanks,
Ron
I'll add that FPKMs are sometimes sequencing-depth normalized in incredibly problematic and unrobust ways. As a general rule, DESeq2/edgeR/limma are to be trusted over anything FPKMs say.
So,what should be the result I go with in this case? fold change from FPKMS or fold change from DEseq ?
Don't use fold changes from FPKM for any type of quantitative analysis. They will not be properly normalized for comparisons between samples or replicates. You should instead look at fold changes that normalize for these differences. In a "standard" analysis, DESeq / DESeq2 / EdgeR / limma will perform such a normalization.
agreed.
at the moment best practices are:
deseq - library size correction and vst limma - voom cpm calculation with some sort of normalization method (e.g., quantile) edger - not entirely sure, not too familiar with it.
i tend to use tpm alot as well, as it doesn't have the same issues as FPKM and RPKM.
agreed with the agreement ;P. I'd only offer the caveat that, while TPM is universally preferable to F/RPKM, it is still a purely relative abundance metric. For this reason alone, it's not really appropriate to use as is for quantitative comparison between samples / condition .
eh, technically rnaseq is only relative quantification without ERCC spike-ins and even then it's somewhat iffy. FDA SeqC has some good work on this issue. also both tpm and cpm can be used for between sample comparisons because they are normalized per million reads, so that should be fine. but i think both are still biased towards larger transcripts. in the past i've found it's a good idea to look at quantile 0.4 as a good cut off for DEG analysis, or use normalized counts from similar sized intergenic regions as an indication of sequencing noise.
True, but spike-ins can exhibit crazy variability / imprecision. Theoretically they are great, but their practical utility is mediated to a large degree by the skills of the person preparing the samples.
yep, completely agree. massively parallel sequencing is unfortunately very noisy especially for smaller and low abundance features. also i think this problem scales linearly with sequencing depth. in my opinion it's still a very "experimental" technology.
Okay,another thing I wanted to know is that a norm to see such differences(have you guys come across such differences too in the past)?
I've never seen differential expression ... ever ... :D. Just kidding. It depends. If your gene annotation has say 15000 genes, and 14000 are differentially expressed then yea, there's probably a problem. What you want are sanity checks. Do you see something that you should expect to see ... a positive control.