Entering edit mode
5.2 years ago
LuisNagano
▴
90
Hello, could anyone help me out? Is there any R package that runs differential expression analysis or statistical test like generating log2FC and adj-p values from normalized RNA-seq and Array expression values? The available data that I need to analyze is in FPKM, a table with ~50000 genes, I don't have access to raw data.
Thank you very much!
which tool is used for quantification? RSEM, stringtie, cufflink? just try tximport package from deseq2 team
The authors don't cite the tool used for normalization. DEseq2 only works with genes raw counts, doesn't it? I have normalized data in FPKM. I want a package for analyse any normalized expression data, like MAS5, RSEM, FPKM, TPM...
This has been discussed before extensively and repetitively, please use the search function. Start from this one: https://support.bioconductor.org/p/102551/ and from there please google around. You'll find pretty much the same answer that the
limma
-based strategy suggested there is probably the best possible but still bad solution to what you aim to do, as FPKM is not suited for differential analysis. Further details on why that is can be found in numerous threads here, on BioC and the web.look at this first tximport tximport will take normalised count and length information to recompute raw count...... you can use tximport to get pseudo raw count then use deseq2 for GDE It works for almost all the modern count quantification tools like kallisto, stringtie. But you need to know which tool is used for gene quantification
No, that is not true and not recommended.
tximport
aggregates transcript abundance estimates to the gene level and corrects for average transcript length, it does not do any magic to save you from inferior normalization techniques like FPKM. The transcript information is already lost in FPKM as in most cases this is already the gene level count, thereforetximport
would be meaningless. If possible, download the raw data from NCBI or ENA and obtain raw counts. Everything else is inferior. Relying on prenormalized counts where (as OP states) the method section lacks details about the pipeline is not reproducible and therefore IMHO not recommended, beyond the issue that FPKM is a poor choice for normalization.