Just to formailize what GenoMax and Kevin Blighe blighe said in their comments:
Categories like, low, medium and high are artificial threshold that have no meaning to nature: nature deal in continua not categories. However, that said, dividing things into categories can be useful. The key here though is that if we are dividing things for convenience sake, we should divide them into the categories that are most convient for the particular task at hand.
The way in which the levels on a miRNA impacts on the expression of its targets is a complex relationship dependent not only on levels of the miRNA, but also the sequence of the target, the transcription rate of the target, and probably the RNA binding protein context of the target in ways which we are only just beginning to understand. That is, for one miRNA-target pair, a miRNA at 10 cpm might have a very large effect on target expression levels, while for a different miRNA-targe pair, the same miRNA expression level might have little or no effect.
One can imagine any number of schemes for dividing miRNAs into categories based on expression. Here are three you might like to consider:
- tertiles: if you have 1000 miRNAs, rank them, and then divide that ranking into three - the bottom 333, the middle 333 and the top 333.
- Divide the range. Find the expression level of the most highly expressed miRNA. Base your thresholds on that - so if the most highly expressed miRNA has a CPM of 600, then you might divide into 0-200 CPM, 200-400 CPM and 400-600 CPM. Note that these are likely to be very different sized categories - doing this with log CPMs might be slightly better, but I'd still expect the bottom category to have far more miRNAs than the other ones.
- As you are likely to find that the counts for many miRNAs is close to zero, you might define an "unexpressed" category (0 or 1 read), a "top expressed category" (top 10% of miRNAs) and a thrid category that contains everything else.
There is no telling ahead of time which of these is best, and it will probably depend on what point you are trying to make, or what hypothesis you are trying to test. I'd probably give all of them a go and see which set of results made the most sense.
I don't think there is a standard for this sort of thing. You will need to decide on those definitions. You could take a look at the distribution of values across samples and see if you can make an informed decision based on actual data.
Thanks for the answer!
Does anyone know other ways?
Use tertiles, quartiles, or anything up to deciles.