which Length of gene is valid for using in Transcript Per million (TPM)?
1
0
Entering edit mode
4.9 years ago
modarzi ▴ 170

Hi. for calculating Transcript Per Milone (TPM) from TCGA HTseq-count I need gene length. also, I used gene code V.22 for annotation which has different columns for each gene. I bring one record from annotation file as an example:

feature     start       end  score  strand  frame   gene_id         gene_name   
 gene       3281801   32897826  .     +       .   ENSG00000206557.5    TRIM71   

full_length   exon_length    exon_num       first_exon            last_exon             
    79809          8685            4      ENSE00001538095.1 ENSE00001498538.5   
one_transcript        one_transcript_start      one_transcript_end 
ENST00000383763.5            32818018                32897826

As you see, for each ensemble gene, we have full_length and exon_length. Now, for TPM calculating I need to 'gene length'. please guide me on which length should I use for TPM?

RNA-Seq Normalization TPM • 1.7k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
1
Entering edit mode
4.9 years ago

Hello again,

Assuming that you have read counts per gene (not per exon), please use full_length.

Kevin

ADD COMMENT
0
Entering edit mode

Thanks. Dear Dr. Blighe, I have another question relate to this problem. as you said, I have to use 'full_lenght' for calculating TPM. So, in my example, 'full_lenght' of 'TRIM71' is 79809. I appreciate it if you guide me should I use 79809 or 79.809? in other words, I have to use gene length based on bp or kbp? and 79809 is bp or kbp? Best Regards

ADD REPLY
1
Entering edit mode

Hi, you can use bp; so, 79809.

ADD REPLY

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6