I have a dataset with 18 cell lines (HG_U133_plus2) normalized using GCRMA. I have collapsed the dataset to genes using the median of the probes. Before proceeding further with any kind of differential expression, how do you guys deal with more than 20 thousand genes.
- Do you remove genes that have low variance across all cellines? If so then do the genes that are highly expressed across all cell lines will be removed...am i on the right track.
- How do you select the threshold for a variance across cell lines per gene?
Thanks