Question

how to categorize genes by their expression levels in RNA-seq?

0

Entering edit mode

8.2 years ago

statfa ▴ 790

Hi,

How can we categorize genes by their expression levels? Is there some criteria on read counts?

Here is a similar question but my answer isn't there.

I wish to categorize genes like this:

low expression

medium expression

high expression

Thanks a lot

Gene Expression RNA-seq categorize • 1.7k views

ADD COMMENT • link updated 8.2 years ago by Petr Ponomarenko ★ 2.8k • written 8.2 years ago by statfa ▴ 790

score 1 · Answer 1 · 2017-02-19

1

Entering edit mode

8.2 years ago

Petr Ponomarenko ★ 2.8k

Depends on the reason for that categorization and statistic you want to use for your test. From statics point of view the best and easiest to explain is the situation when you have control data and you can transform the data using basic functions to normally distributed and then find mean and standard deviation, then decide, that for example, everything outside 2 standard deviations is low/high. Using QQplot at that point to remove outliers is a way to clean the data a bit. On that plot you may see a part of the distribution with different mean and standard deviation. This is usually due to noise and you can remove it or correct for it. In case of RNA-seq you probably have some sort of FPKM or similar measure. Log transform is one thing to try. At least this is what I try to do first with RNA-seq data. Sometimes this can not be done.

ADD COMMENT • link 8.2 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

Thanks for the info. I thought maybe there was some predetermined criteria. For example genes with counts less than 5 or 10 are considered lowly expressed according to EdgeR manual. I wished to know the criteria for other levels but it seems I have to calculate it using control data but I don't have any.

The reason why I'm looking for these levels is that I want to examine which of my two DEG detection models is doing better detecting Highly expressed genes, medium expressed genes and lowly expressed genes as DE.

Thank you

ADD REPLY • link 8.2 years ago by statfa ▴ 790

1

Entering edit mode

I see you do not have any controls for normalization. Another way around is to use a subset of stably expressed genes between different samples under different conditions. Usually, these are some of housekeeping genes. Normalize your data based on them.

ADD REPLY • link 8.2 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

Oh ok. Thanks a lot. I haven't done it before but I try to see if I can handle it.

ADD REPLY • link 8.2 years ago by statfa ▴ 790