Converting transcript_count_matrix to TPM values obtained from RNA seq data
0
0
Entering edit mode
2.0 years ago
Manav • 0

Hi All, I recently got my RNA seq data which has both 'gene_count_matrix' and 'transcript_count_matrix' files. I have performed the DGE analysis in edgeR using gene_count_matrix file.

I wanted to get transcripts per million matrix for downstream analysis. Can anyone explain how can I do that? Or direct me where I can find more information to this?

Thanks a lot.

TPM edgeR • 1.7k views
ADD COMMENT
0
Entering edit mode

Hi Manav,

What was the algorithm that you employed for estimating the abundance of your genes/transcripts? I mean RSEM, kallisto, salmon...

Bests,

Rodo

ADD REPLY
0
Entering edit mode

Hi, I am not sure. We had outsourced the sequencing to a company that gave us the raw counts for both gene and transcript after doing the sequencing. This is what they have mentioned in their methods:

After the final transcriptome was generated, StringTie and ballgown was used to estimate the expression levels of all transcripts.

ADD REPLY
0
Entering edit mode

Allright,

After reading the reference manual of StringTie it looks like that the algorithm produces a raw count matrix as input for DESeq or edgeR. In this case, I suggest you to follow the next steps estimate the TPM for each gene/transcript by using your raw counts matrix. For calculating the length of each gene, use the biomaRt package to retrieve the start and end coordinates of the genes.

Bests,

Rodo

ADD REPLY
0
Entering edit mode

That will work for transcripts, not genes.

ADD REPLY
0
Entering edit mode

Hi swbarnes2, Just to confirm, the above method suggested by Rodo will be useful only if I use the transcript counts file right?

ADD REPLY
0
Entering edit mode

Gene length is not meaningful here, because you don't want to include introns, and if you have multiple transcripts of different lengths present for one gene, you'd have to account for that too. Programs like RSEM and Salmon will do this math for you, but it would be tricky to do it yourself. Transcript-based TPM would be much more straightforward to do yourself. One transcript has one length only.

ADD REPLY
0
Entering edit mode

Hi, Yes I will try to use the transcript counts as you have guided. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6