@harnish is right that Tximport should be able to do transcript to gene level conversion, although I'm not sure if StringTie or express are sources it handle importing form automatically. I've not heard of Tximport calculating TPMs from counts, but I could be wrong. Still, these things arn't hard to calculate yourself.
TPM from counts
To calculate TPM first calculate the RPKM/FPKM for each gene. Actually, you only need F/RPK as the per million will come out in the wash. It also doesn't matter if you use pairs (F) or reads (R). That is counts/pairs mapping to a gene divided by the total exonic length of the gene.
Some more sophisticated algorithms will use an effective length rather than real length.
To convert this to TPM divide the FPKM of each gene by the the sum of FPKM for all genes and multiply by 1 million.
If I have a dataframe df
with three columns gene_id
, counts
and length
then TPM is calculated:
df$RPK <- df$counts/df$length
df$TPM <- df$RPK*1000000/sum(df$RPK)
Gene TPM from transcript TPM
As TPM is transcripts per million, the gene TPM is simply the some of the transcript TPMs for all transcripts belonging to that gene.
If our dataframe transcript_tpm
has gene_id
, transcript_id
and TPM
, then we calculate gene TPM using dplyr
thus:
gene_tpm <- group_by(transcript_tpm, gene_id) %>% summarize(TPM = sum(TPM))
Can you please tell, how did you manage to get transcript level TPM using stringtie?
Because, I need transcript level TPM but stringtie output I get has TPM at gene level only.