Question

Deep exploration of an expression atlas

0

Entering edit mode

7.2 years ago

lessismore ★ 1.4k

Dear all,

i want to deeply analyse an expression compedium. A part from the classical methods such as
1. plotting for each sample expression distribution via boxplot or density plots
2. clustering via PCA
3. producing expression heatmaps

i would be interested in other solutions that would allow me to discover biomarkers for a specific set of samples or exploring the variance of the gene expression in order to focus on specific sets of genes or, conversely, discard specific sets of genes that are not relevant in terms of expression changes across samples.

Thanks in advance

R RNA-Seq data exploration • 1.6k views

ADD COMMENT • link 7.2 years ago by lessismore ★ 1.4k

1

Entering edit mode

You question is a little vague. There are many methods you could use but it depends what exactly you are looking for and what type of data are available. For example, you could use geneFilter bioconductor package to select genes based on high/low variance.

ADD REPLY • link 7.2 years ago by Matina ▴ 250

0

Entering edit mode

My aim would be focusing on genes with interesting (preferred) expression changes to be useful for prediction purposes

ADD REPLY • link 7.2 years ago by lessismore ★ 1.4k

1

Entering edit mode

So like a kind of feature selection type of analysis? If that is the case you could look for co-expression patterns in genes in specific samples/groups of samples. Another option is to use machine learning methods such as Recursive feature elimination (RFE). For my own analysis I have been using the caret package in R which is great. Check out the feature selection section.

Hope this helps.

ADD REPLY • link 7.2 years ago by Matina ▴ 250

0

Entering edit mode

Thank you very much, ive been looking at genefilter, precisely to this:

varFilter(as.matrix(my_expression_data), var.func=IQR, var.cutoff=0.6, filterByQuantile=TRUE)

from what ive read, it calculates the interquartile range of each gene, then it sorts the values and it takes the ones above the 60percentile. is that correct?

ADD REPLY • link 7.2 years ago by lessismore ★ 1.4k