pre-filtering expression data
0
0
Entering edit mode
9.1 years ago

Hello guys,

I have a the RNA-seq normalized data as well as methylation data for a couple of hundred samples, for each sample there are a couple of hundred thousand features. However, before I do a feature selection, I need to pre-filter the features so that at least 90% of the useless features removed. What method is best for that? Are there any R script or package that does that?

differential-expression • 2.3k views
ADD COMMENT
1
Entering edit mode

I think the most important question to you is: What is defined as a useless feature? Do you mean something that doesn't contribute to a treatment or condition? In that case, you can perform differential expression analysis between conditions to "preselect" those features. Or maybe you want to see if there is some relationship between your methylation and rna-seq data? Then maybe you setup a correlation matrix between all RNA Seq count and the methylation peaks (assuming it is ChIP-Seq?) then only look at features with high enough correlations (e.g. I am thinking of something similar to the eQTL analysis)

ADD REPLY
0
Entering edit mode

Aside from removing features that are not expressed at all (simple R commands to do that are easy to find), you can filter based on variance or median absolute deviation. For instance, the M3C package includes a function to do this, you can see section 5.2 of the package vignette (https://bioconductor.org/packages/devel/bioc/html/M3C.html).

Although it is relatively simple to write the commands yourself as well.

ADD REPLY

Login before adding your answer.

Traffic: 1744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6