Question

Identifying genes associated with a binary classification for many cell lines

0

Entering edit mode

5.8 years ago

bimlay2 ▴ 30

I have gene-wise expression data for 35 cell lines with ~3 runs per cell line. I also have a binary classification for each cell line associated with a biological phenomenon.

I am interested in finding the genes that are most associated with the binary classification. I have tested several approaches, but I wanted to ask if anyone had insight into these sorts of problems.

So far I have:

Generated univariate AUC scores for each gene, which essentially gives a measure of how separated the binary groups are for each gene.
Used an array of binary classifiers and subsequent variable importance analysis to generate ranked gene importance.

Am I missing an obvious method? Do my approaches so far make sense?

RNA-Seq R • 1.0k views

ADD COMMENT • link 5.8 years ago by bimlay2 ▴ 30

0

Entering edit mode

You describe that you are interested in finding genes most associated with the binary classification (versus building a predictor of your binary class?). If this is a gene selection question, I would think one alternative would be a differential expression approach: i.e. limma or equivalent with your binary classes as contrast, and rank the genes with largest and/or most significant differences between the two classes.

ADD REPLY • link 5.8 years ago by Ahill ★ 2.0k

0

Entering edit mode

Thanks for your comment. I actually used DESeq2 to generate DE results. The mean-dispersion trend looked weird, and I got super, super low p-values. I wasn't sure if any DE method was suited for 35 cell lines lumped into two groups.

ADD REPLY • link 5.8 years ago by bimlay2 ▴ 30

0

Entering edit mode

Ah, OK. If 'biological.phenomenom' is a binary label on each cell line (not an experimental factor that you modulated) then I suppose very confounded with cell.line effects. If cell.line effects are large (probably) but there are still 'biological.phenomenom' main effects that are large enough to observe in that background, then perhaps a rank-based approach like a per-gene univariate Mann-Whitney test comparing the two levels of 'biological.phenomenom' would be worth a try.

ADD REPLY • link 5.8 years ago by Ahill ★ 2.0k