Happy holidays!
I am currently studying the paper RNA-Seq gene expression estimation with read mapping uncertainty.
If I understand correctly, in the generative model the probability of picking a read from a given transcript is equal to the abundance of that transcript: p(G_n=i|θ) = θ_i. Thus, if there is a 1kb transcript and a 10kb transcript expressed at the same level (TPM), their model would predict close to equal number of reads for the two transcripts.
However, due to fragmentation, in "real" RNA-Seq the 10kb transcript would result in 10x more fragments and thus 10x more reads.
Am I missing something here, or is the model in that paper wrong?
This makes perfect sense. Thank you!