I did two batches of RNA-seq upon same drug treatment but at different time points on the same cell line. For each time point, I have corresponding DMSO controls. All my Fold Change values are calculated over the corresponding DMSO control. I'm trying to trace the gene expression dynamics upon my drug treatment. I found that if I simply merge two datasets into one master table and plot the RPKM and Fold Change values, the numbers do not reflect the real dynamics I saw by q-PCR. For example: I knew one gene (and a few more) that can be induced highest upon 4-day treatment but the RPKM values and Fold Changes do not show the same trend. So my question is:
What's the best way to address this issue?
Appreciate your help.
Hang
Below are the scripts I used.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.Ok, then perhaps you could try to add the batch into the model/design. Although if you processed the DMSO/drug samples at the same time, it shouldn't be very impactful.
Is your fold change also computed based on RPKM values ? RPKM is not a proper between sample normalization metric so I suggest that you try a different method such as edgeR/DESEQ.
Thanks for the reply. My fold change is computed based on counts values. I used edgeR to calculate counts and generated RPKM values.
What was different between the two batches? In addition, as said above you shouldn't use RPKM, use statistics as in DESeq2
As mentioned by @tud55122 and @Asaf you shouldn't use RPKM values. I have used DESeq2 and there you have option to use batch+ condition information to make comparisons. I am not aware is edgeR ihas similar options. If you have counts data then you can easily follow DESeq2 vignette to see how to do that. I use the 'vst' based normalization which is better than RPKM based normalization. Hope it helps.
Seems you have two timepoints? Two conditions? and then a number of replicates per condition/timepoint - could you describe exactly which samples were sequenced together?