Hello all,
this is probably a very obvious question, but I've never dealt with this sort of a problem, so I hope you all can point me in the right direction.
Imagine we have an array or annotated and quantified RNA-seq experiment. There are about ~24k genes, with normalized numerical expression value (or FPKM) assigned to them.
What is the most statistically sound way to automatically classify genes as "expressed" and "not expressed"? People often use empirical cutoff for this, e.g. FPKM of 1, but that's not what I'm interested in.
Thank you for any inputs.