Question

pre-filtering expression data

0

Entering edit mode

9.1 years ago

ha.hassanzadeh • 0

Hello guys,

I have a the RNA-seq normalized data as well as methylation data for a couple of hundred samples, for each sample there are a couple of hundred thousand features. However, before I do a feature selection, I need to pre-filter the features so that at least 90% of the useless features removed. What method is best for that? Are there any R script or package that does that?

differential-expression • 2.3k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.1 years ago by ha.hassanzadeh • 0

1

Entering edit mode

I think the most important question to you is: What is defined as a useless feature? Do you mean something that doesn't contribute to a treatment or condition? In that case, you can perform differential expression analysis between conditions to "preselect" those features. Or maybe you want to see if there is some relationship between your methylation and rna-seq data? Then maybe you setup a correlation matrix between all RNA Seq count and the methylation peaks (assuming it is ChIP-Seq?) then only look at features with high enough correlations (e.g. I am thinking of something similar to the eQTL analysis)

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.1 years ago by Sam ★ 4.8k

0

Entering edit mode

Aside from removing features that are not expressed at all (simple R commands to do that are easy to find), you can filter based on variance or median absolute deviation. For instance, the M3C package includes a function to do this, you can see section 5.2 of the package vignette (https://bioconductor.org/packages/devel/bioc/html/M3C.html).

Although it is relatively simple to write the commands yourself as well.

ADD REPLY • link 5.4 years ago by chris86 ▴ 400