Depends on the reason for that categorization and statistic you want to use for your test. From statics point of view the best and easiest to explain is the situation when you have control data and you can transform the data using basic functions to normally distributed and then find mean and standard deviation, then decide, that for example, everything outside 2 standard deviations is low/high. Using QQplot at that point to remove outliers is a way to clean the data a bit. On that plot you may see a part of the distribution with different mean and standard deviation. This is usually due to noise and you can remove it or correct for it. In case of RNA-seq you probably have some sort of FPKM or similar measure. Log transform is one thing to try. At least this is what I try to do first with RNA-seq data. Sometimes this can not be done.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks for the info. I thought maybe there was some predetermined criteria. For example genes with counts less than 5 or 10 are considered lowly expressed according to EdgeR manual. I wished to know the criteria for other levels but it seems I have to calculate it using control data but I don't have any.
The reason why I'm looking for these levels is that I want to examine which of my two DEG detection models is doing better detecting Highly expressed genes, medium expressed genes and lowly expressed genes as DE.
Thank you
I see you do not have any controls for normalization. Another way around is to use a subset of stably expressed genes between different samples under different conditions. Usually, these are some of housekeeping genes. Normalize your data based on them.
Oh ok. Thanks a lot. I haven't done it before but I try to see if I can handle it.