Question

TPM to differential expression

0

Entering edit mode

3.5 years ago

pramodkhadka69 • 0

I have to analyze publicly available database. In that experiment, they have t conditions, one mock and another treatment. They have published the TPM values of each gene for each condition. For the differential expression, they have simply subtracted the TMP of mock by TPM of treatment. Is there a way I can change that to fold change? any help would be appreciated.

RNA-seq • 4.7k views

ADD COMMENT • link updated 3.5 years ago by rpolicastro 13k • written 3.5 years ago by pramodkhadka69 • 0

score 2 · Answer 1 · 2021-05-17

2

Entering edit mode

3.5 years ago

Ram 44k

TPM cannot be compared across samples, as it does not address sample-specific variation - a transcript expressed in one sample/condition but not the other would create problems with the TPM metric.

You should look at obtaining the raw counts if you wish to perform DE analysis. Subtracting TPM is utterly ineffectual and nonsensical as a metric of comparison except if the comparison is within the same sample.

ADD COMMENT • link 3.5 years ago by Ram 44k

0

Entering edit mode

Thanks for the input. I was not able to get raw counts, unfortunately. I have encountered 2 published databases where they have used subtraction as way to find differential expression between treatments and Mock. Actual quote from paper methodology, Differential gene expression was assessed by subtracting the number of transcripts (TPM) in COR-treated samples from that in the time-matched, mock-treated sample.

ADD REPLY • link 3.5 years ago by pramodkhadka69 • 0

1

Entering edit mode

It can be the case that published methodologies are incorrect, and in this case I would agree with Ram to not do this. There are a few important considerations in RNA-seq differential expression analysis - overdispersion, compositional bias, and library size differences. Simply subtracting TPMs I would argue is no better than a random DEG generator.

If the data was published the fastq files should be available on SRA or ENA. Do you have a link to the paper or database? If the worst case scenario is true and you only have TPM values, you could potentially use limma on the log(TPM+1) values, although this is very suboptimal.

ADD REPLY • link 3.5 years ago by rpolicastro 13k