Recently I have used Stringtie to compute the reads of RNASeq mapping to transcripts. There are two transcripts of a gene with exactly same length and number of exons (as well as the assembly structure of the two transcripts) and I found the coverages were very different from each other.
##transcript
t_id chr strand start end t_name _exons length gene_id ene_name cov FPKM
77237 chr17 - 7668402 7687538 ENST00000269305.7 11 2579 ENSG00000141510.14 TP53 31.946598 5.549151
77238 chr17 - 7668402 7687538 ENST00000620739.3 11 2579 ENSG00000141510.14 TP53 2.961419 0.514401
I am wondering how the stringtie has calculated the coverage. By its definition and if my understand were correct, the coverage was calculated as \sum{seq_i*mapped-seq-length_i}{i=1}{m}/transcript_length
. If this is true, I expect the coverage should be similar of these two transcripts but why they were so different.
Did you find the solution anywhere else? we are struggling to find out the same. It is not clear anywhere.
you may follow up with this post on github. may be someone is listening
https://github.com/gpertea/stringtie/issues/162