TMM or TPM for RNA sequence differential gene expression analysis
2
0
Entering edit mode
2.6 years ago
Mahmoud • 0

Hello,

I was wondering if is wrong to use TPM for differential gene expression analysis from RNA sequence data. The samples that I want to compare is technically the same cell line but treated one is untreated and the other is treated with a drug. I have read that TMM should be used if you are comparing different samples or different tissue or different cell line. However, in my case even though I am comparing different samples the cell line is the same the difference is that one group is treated with a drug while the other is not.

The reason why I am asking the question is because the group bioinformatics (who doesn't really have a biology background) insists that we have to use TMM because we are comparing different samples. However, I think TPM is fine here because technically it is the same cell line but treated different. I know that TMM is used when comparing samples from different origins or different cell lines.

Lastly, looking at the data normalized via both methods, the TPM data make more sense and correlates with actual biological validations that I have done in the lab and the literature. Any input is greatly appreciated

Thanks,

TPM DGEA sequence RNA TMM • 4.1k views
ADD COMMENT
0
Entering edit mode
2.6 years ago

You should use raw counts and then apply a statistical method.

Consult the following for some ideas of what statistical methods are in use

STATISTICAL METHODS FOR BULK RNA SEQUENCING DIFFERENTIAL ANALYSIS

Current popular methods for bulk RNA-seq differential analysis methods could be classified into four categories based on the type of statistical methods used for differential analysis:

  • t-test analogical methods (Cuffdiff and Cuffdiff2)
  • Poisson or negative binomial model-based methods (edgeR, DESeq, DESeq2, baySeq, EBSeq)
  • non-parametric methods (SAMseq and NOIseq) and
  • linear models (voom and sleuth).

As you can see there, directly comparing TPM or TMM are not recommended approaches

ADD COMMENT
0
Entering edit mode

Hello Istvan,

Thank you so much for this information. I think what the bioinformatician did was to normalize the data using the raw count. Raw count was normalized via TPM and TMM. Then edgeR was use to for differential gene analysis using the normalized TMM data. However, as I said the data that comes after normalizing the raw count via TMM doesn't make sense. My positive control genes that change experimentally under the drug treatment do not change in the RNA seq data when the data is normalized via TMM. However, the TPM data correlates with the my experiments. That is why I was wondering why not use the normalized TPM data for differential gene analysis.

Thanks,

Mahmoud

ADD REPLY
0
Entering edit mode

Read the edgeR or DESeq2 papers. Gene-level TPMs generally throw away too much information for proper DE methods, which generally try to account for composition bias as referenced in this answer to a previous question.

TPMs are useful for comparing expression within a sample but not comparable between samples.

ADD REPLY
0
Entering edit mode

Seconding that. Actually, if there was no notable composition bias then TMM and TPM should actually agree a lot (expect the gene length correction aspect of course). It is suspicious that there is notable difference when it comes to the actual message here in terms of your wetlab results being confirmed or rejected. Be sure to carefully review the results to rule out the possibility that the TMM one is actually correct, because it is usually the preferred way for normalization and as said if there was no composition bias then they should be very similar, see also TMM-Normalization -- in any case it sounds suspicious to me.

ADD REPLY
0
Entering edit mode

Can you please briefly explain to me what is composition bias and the factors that results in composition bias? Is it related the the sequencing reaction itself? Because if my compound somehow changes the mRNA of cell line increase the mRNA levels of specific genes and decrease others. Then I would like to see this in the analysis

ADD REPLY
1
Entering edit mode

The link in my comment has an example, but this video from the 1:00-4:00 mark is an excellent primer on the library composition problem. Only three minutes! The rest of the video (~12 minutes total) explains DESeq2's method for library normalization to deal with this (and other) potential difference(s) between samples, which may be helpful to understand as well.

If you need a more in-depth explanation, I'd take a look at the original DESeq2 or edgeR papers.

ADD REPLY
2
Entering edit mode

If you need a more in-depth explanation, I'd take a look at the original DESeq2 or edgeR papers.

But they are written in latin.

ADD REPLY
0
Entering edit mode

just to make sure that we are not misunderstanding each other,

do not perform a TPM then, subsequently, use a method like deseq or edgeR on top of your normalized data - those methods will also normalize the data, you just would end up normalizing the data twice, of course TMM will have huge effect on that

Let the method itself apply the normalization on the raw data, then ask the method to provide you with the normalized matrix (usually there is a function call that returns the normalized matrix).

what I am saying is don't apply a normalization twice ...

ADD REPLY
0
Entering edit mode

I doubt that she did that but it doesnt hurt to ask

ADD REPLY
0
Entering edit mode
2.6 years ago
Mahmoud • 0

Hello everyone,

I just would like to mention that I am not a bioinformatician. I am a biologist and the analysis was done by the team bioinformatician, who dont have a biology background. That is why it is a little bit tough to communicate.

The first thing that I did when I was given the TPM and TMM files is to look at my positive control genes. These genes I know for a fact do change (either up-regulate or down-regulate) from my wet lab experiment change. I found that those genes do in fact change in the TPM file, however, no change was observed in the TMM. In addition, the bioinformatician performed DE using edgeR because I guess she read it somewhere that if you need to compare different RNA sequences samples you have to use TMM. I am not sure how to proceed by trusting the DE analysis that are based on the TMM normalization method when I do not see my positive control change via this method, it doesn't make any sense.

Can someone please explain why I cannot use the TPM for DE? I keep hearing that it cannot be used to compare different samples and how it doesn't account for composition bias. Is composition bias exists because different cell lines/ tissue have different RNA compositions? or is it something related to the sequencing reaction it self? Because if I have 2 samples of the same cell line but one is untreated sample and the other is treated with a compound. If the biological effect of the compound to effect total RNA of the cell line that I would pretty much like to have this in my DE analysis.

I apologize for the long email. and thank you everyone

ADD COMMENT

Login before adding your answer.

Traffic: 2806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6