Hi all!
I'm going to detect differentially expressed genes with RNA-seq data got from some "GFP positive cells" and "GFP negative cells". However, the cDNA are sequenced with two different methods, one as "normal" RNA-seq, and the other is low-input RNA-seq (only requires a small amount of starting materials). Here's the summary of number of dataset I got in each cell type with each method:
GFP positive cells * normal RNA-seq : 1
GFP negative cells * normal RNA-seq : 1
GFP positive cells * low-input RNA-seq : 3
GFP negative cells * low-input RNA-seq : 2
In such a case, what kind of statistics/tools can be applied to detect DEGs in GFP positive cells VS GFP negative cells?
Thanks all!
Thanks Santosh! Would DESeq2 accept FPKM values as input? Or only raw read count?
Almost all the s/w for DiffExp, including DESeq2, require raw read counts for the statistical model to work correctly.
What if I only have FPKM values? Could I take log of them and remove low expressed genes (to make the distribution approximate to normal)?
You can't simply un-normalize the data. Some tools do diffexp with fpkm, but you need to have original depth of libraries. And RNAseq count data doesn't follow Gaussian, instead they are modelled more as negative binomial. For a detailed discussion, see this Reddit
https://www.reddit.com/r/bioinformatics/comments/3bx3em/fpkm_vs_raw_read_count_for_differential/
Got it. Thanks Santosh!
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.