Question

TPM VS FPKM VS TMM and K-mer specific expression quantification

2

Entering edit mode

8.2 years ago

user230613 ▴ 380

Hi Biostar's!

I'm trying to find the best approach to fill my goal. I would like to measure the expression of specific set of kmers - 10 aa long peptide sequences across different samples. I have RNA-Seq data for each sample. I'm planing to use alignment-independent methods such as Kallisto or Salmon. My first question is:

1) How can I measure the expression of specific Kmer when this Kmer is not unique in the genome? Let's say that it can be coded by different transcripts in the same gene or in different genes sharing the sequence... How can I measure the global abundance of that Kmer in the sample?

For other hand...:

2) Which method should I choose for measuring the expression? I have read that TPM has overcome FPKM. But both TPM and FPKM measure the relative abundance, maybe I should consider measuring absolute abundance with TMM. Is there any specific case when relative is preferred over absolute measurement?

Thank you in advance,

RNA-Seq kmer • 6.4k views

ADD COMMENT • link updated 8.0 years ago by Rob 7.1k • written 8.2 years ago by user230613 ▴ 380

score 3 · Answer 1 · 2017-08-01

Kallisto and Salmon quantify transcript expression, not kmer expression. They use kmer matches between reads and transcripts to quantify transcript expression, as estimated by read counts.

1) How can I measure the expression of specific Kmer when this Kmer is not unique in the genome?

Expectation-maximization algorithm.

2) Which method should I choose for measuring the expression?

What you want to do? TPM and FPKM are within-sample normalizations, intended to allow comparison of expression levels of different genes from the same sample. They are not needed for differential transcript expression between different samples.

score 3 · Answer 2 · 2017-08-01

3

Entering edit mode

8.0 years ago

Rob 7.1k

To add to h.mon's answer, there is generally no "absolute" measurement for transcript expression. For example, the number of reads assigned to each transcript depends on sampling depth, relative abundnaces, etc. I have written a blog post on some of the different expression measurements that are common that you can read here. Salmon outputs both TPM and the estimated number of reads assigned to each transcript. The former is useful for within-sample analysis. To perform e.g., differential expression testing, you can read Salmon's output using a tool like tximport. This will allow you to import all of your quantified samples directly into R in a way that the built-in between-sample normalization approaches for tools like DESeq2 and EdgeR can be directly applied.

ADD COMMENT • link 8.0 years ago by Rob 7.1k

0

Entering edit mode

Hi Rob. Could you guide me a bit? If I want to find the expression of a given kmer (10nt) that is common between two transcripts, should I add the TPMs of both transcripts?

ADD REPLY • link 7.8 years ago by user230613 ▴ 380

0

Entering edit mode

Hello, again, I wonder if you could guide a little bit with my previous question? Thank you :)

ADD REPLY • link 7.6 years ago by user230613 ▴ 380