Hi All,
I want to analyze RNA-seq datasets generated from different experiments, and compare the expression of selected genes across those samples. One of the experiment has 2 conditions (WT1, Drug1), and the other dataset has 4 samples (WT2, Drug2, Drug3, Drug4). The WT1 and WT2 samples are similar but not identical. What I want to do is to compare the expression of selected gene sets in Drug1 and Drug2 samples.
I used kallisto to quantify all 6 samples together, and used either DEseq2 or sleuth for cross-sample normalization. I could extract the cross-sample normalized TPM values from the sleuth or DEseq2. And now I want to look at the normlaized TPM in Drug1 and Drug2 sample to check if one of them is higher or lower. But I have a few questions:
- Does this approach sound reasonable, or are there better ways to do it?
- Because the datasets are from different labs/experiments, how do I know that the cross-sample normalization worked?
- I tried plotting boxplot of the TPMs extracted from DEseq2 and some of the medians are slightly different. Does that mean I need to do a different normalization approach?
Any suggestions and help is greatly appreciated.
Thank you! Urja
Thanks a lot Devon. A few clarifications if you could please:
When you say, "add experiment1 and experiment2 to your design", do you mean that I should just relabel WT1 AND WT2 samples to WT, so that WT is shared across the two experiments (with 6 replicates now)? Or something more complicated like covariates etc?
And about the cross-sample normalization: I thought for a proper normalization, the medians should be well aligned if I plot TPM boxplot. But in my case the medians of Drug1 samples are slightly higher than the rest. So I am worried that if overall Drug1 has higher TPM, the differences could be just because normalization did not do a god job. Or maybe it doesn't matter?
Thanks again for your help. Urja
Yes, exactly. You can then add an
experiment
variable to the design with values1
and2
.It's not so much the TPMs, but the medians that should be quite similar across samples. How different are the values you're seeing? Can you post a plot?
Thanks Davon. Please see below the plot. This is normalized TPM that I extracted from Sleuth (which uses DEseq2 cross sample normalization I think). WT and Drug1 have a bit higher medians compared to rest of the data.
How much differences in the median is okay to go ahead with the DE analysis?
I have to admit to not being overly familiar with sleuth. What happens if you use DESeq2 with tximport? Do you have similar issues?
I tried and got very similar results with that too. Please let me know what are your thoughts in that case. Thank you.
I suspect it's fine then and due to plotting TPMs rather than counts. I expect there's some isoform switching going on due to drugs 2-4.