Hello,
I was wondering if is wrong to use TPM for differential gene expression analysis from RNA sequence data. The samples that I want to compare is technically the same cell line but treated one is untreated and the other is treated with a drug. I have read that TMM should be used if you are comparing different samples or different tissue or different cell line. However, in my case even though I am comparing different samples the cell line is the same the difference is that one group is treated with a drug while the other is not.
The reason why I am asking the question is because the group bioinformatics (who doesn't really have a biology background) insists that we have to use TMM because we are comparing different samples. However, I think TPM is fine here because technically it is the same cell line but treated different. I know that TMM is used when comparing samples from different origins or different cell lines.
Lastly, looking at the data normalized via both methods, the TPM data make more sense and correlates with actual biological validations that I have done in the lab and the literature. Any input is greatly appreciated
Thanks,
Hello Istvan,
Thank you so much for this information. I think what the bioinformatician did was to normalize the data using the raw count. Raw count was normalized via TPM and TMM. Then edgeR was use to for differential gene analysis using the normalized TMM data. However, as I said the data that comes after normalizing the raw count via TMM doesn't make sense. My positive control genes that change experimentally under the drug treatment do not change in the RNA seq data when the data is normalized via TMM. However, the TPM data correlates with the my experiments. That is why I was wondering why not use the normalized TPM data for differential gene analysis.
Thanks,
Mahmoud
Read the edgeR or DESeq2 papers. Gene-level TPMs generally throw away too much information for proper DE methods, which generally try to account for composition bias as referenced in this answer to a previous question.
TPMs are useful for comparing expression within a sample but not comparable between samples.
Seconding that. Actually, if there was no notable composition bias then TMM and TPM should actually agree a lot (expect the gene length correction aspect of course). It is suspicious that there is notable difference when it comes to the actual message here in terms of your wetlab results being confirmed or rejected. Be sure to carefully review the results to rule out the possibility that the TMM one is actually correct, because it is usually the preferred way for normalization and as said if there was no composition bias then they should be very similar, see also TMM-Normalization -- in any case it sounds suspicious to me.
Can you please briefly explain to me what is composition bias and the factors that results in composition bias? Is it related the the sequencing reaction itself? Because if my compound somehow changes the mRNA of cell line increase the mRNA levels of specific genes and decrease others. Then I would like to see this in the analysis
The link in my comment has an example, but this video from the 1:00-4:00 mark is an excellent primer on the library composition problem. Only three minutes! The rest of the video (~12 minutes total) explains DESeq2's method for library normalization to deal with this (and other) potential difference(s) between samples, which may be helpful to understand as well.
If you need a more in-depth explanation, I'd take a look at the original DESeq2 or edgeR papers.
But they are written in latin.
just to make sure that we are not misunderstanding each other,
do not perform a TPM then, subsequently, use a method like deseq or edgeR on top of your normalized data - those methods will also normalize the data, you just would end up normalizing the data twice, of course TMM will have huge effect on that
Let the method itself apply the normalization on the raw data, then ask the method to provide you with the normalized matrix (usually there is a function call that returns the normalized matrix).
what I am saying is don't apply a normalization twice ...
I doubt that she did that but it doesnt hurt to ask