Hi all,
I have a question regarding differential expression using DEseq. I have 2 plant tissues (leaf and stem) and 2 species (A and B) with 3 biological replicate each. After performing differential expression using DEseq, can i compare the top 10 differential expressed genes and see how their RPKM values are compared? For example for gene1 can i calculate ratio of RPKM between A/B for leaf and stem respectively?
I wanted to do this so that i have two independent analyses (one using RPKM and the other using DEseq). Am i thinking correct or i need to compare the differential expressed genes with normalized counts or no need to compare at all. Any help is appreciated.
Thanks
Upendra
I'm afraid you might need to be more specific to get fruitful answers from the community. Assuming this question is actually a DESeq question, you might wanna state clearly what you're looking to achieve and why you have chosen this path. Then delve deeper and frame the question so it is actually about DESeq. Right now, while you have beautifully explained what you have, I'm not sure what you are looking to analyze.
Why are you talking about RPKM and DESeq in the same sentence. Please tell me that you're not putting rounded RPKM values into DESeq...
It sounds like Upendra wants to do two independent statistical analyses, one by DESeq and another by ratio of RPKM. I don't really see a question, other than 'is this a good idea?'
Consider the result as a venn diagram, some genes found DE by either method. What good does that do? What will you conclude of a gene found DE by one method and not another? This is the path towards madness.
I wanted to be a bit civil about making that last statement, hence my stretched out but pointless comment :)
Ah, I pretty much stopped reading after seeing RPKM and DESeq mentioned together. I've simply seen way too many people try to use DESeq/edgeR/etc. with inappropriate data.
Anyway, it would seem to make more sense to perform the statistical tests and then filter by fold-change to get a list of likely biologically relevant changes.
It's good to point this out and to be diplomatic. However, any given method for calling DE makes assumptions, and trying different methods can test the limits of these assumptions. One can perhaps have more confidence for genes in the intersection. Genes resulting from only one or the other may be near the edge, and if it's your favorite gene and perhaps providing a clue, it may be telling you something about your experiment. If you have no clues about any of the genes, and are simply doing a massive screen of all genes - then you're right, these are simply distracting, and considering them is pointless.
Thank you all for the replies with your comments/suggestions/criticisms etc., @Devon I am not putting RPKM values into DESeq and I am using counts only for performing DESeq and so I guess I am ok with it. @karl you are right.. my intention is to perform two different analyses and then compare them together but maybe I am thinking it is not a good idea since RPKM has already proved to be not good. So you all think I just should stick with DESeq and don't bother about RPKM then?
Youre free to do either analysis, but think about your goals. What will you do with the genes found to be DE by one tool and not the other? Maybe ignore them, and keep the ones where both methods agree? Maybe a smaller gene list is what youre after, but most of us will warn you that a simple ratio of RPKM will mislead you with artifacts. I've seen some people report >4000 fold change due to small RPKM in a ratio; and it's nonsense. DESeq will more accurately model the read distribution than a ratio. Maybe instead, do just DESeq or edgeR, then when you have a good gene list, go ahead and look at only their RPKM ratios if those are easier for you to report.
Thanks @karl. Sounds good. But I have one final question. After I get the differentially expressed genes with significant FDR, do I need to worry about reporting their RPKM ratios since DESeq itself is proved to be quite reliable for modeling RNASeq data and differential expression?
I wouldn't waste the time on it. Your target journal might ask for qPCR verification anyway. I'm more interested in the RPKM's ratio to whichever "housekeeping gene" your lab likes to use.