-is Poisson a better approximation for high-expressors than for low-expressors (i.e., are low-expressors more overdispersed than high expressors)?
-is it possible that overdispersion is a result of bottlenecks in the sample prep process that result in small numbers of low expressors at points upstream from the final read-sampling? Not clear what the effective molecule population sizes are for low expressors during poly-A selection, etc. If rare mRNAs were being sampled from small populations at one or more upstream bottlenecks, wouldn't we expect a convolution of Poissons due to successive samplings, not a sum (negative binomial)? We would also predict more overdispersion for low expressors, and that the overdispersion would be observable between libraries prepared from the same sample, but not within repeated sequencings of a single library, which should look Poisson because only read-sampling is applied. Does anyone know if this conjecture is consistent with observations?
"...the overdispersion would be observable between libraries prepared from the same sample, but not within repeated sequencings of a single library, which should look Poisson because only read-sampling is applied." Yes, this is the consensus view from what I know. Don't know about low vs high expressors.