I have a set of new annotation candidates to be considered in a genome using data from proteomics.
I would like to support them with data coming from RNA Sequencing data. My initial idea was to take all the trusted annotations and study its distribution in RPKM (population A) and the same for regions of the genome with no annotation (population B). With these two population I would calculate the probability of my candidate belonging to population A (expressed) or population B (not expressed). I see this approach excessively complex to just say if a region is being expressed or not.
Is there any tool or approach to assign a probability of being expressed per base/annotation? If not, does it make sense the approach suggested?